370 Hours - Malay Speech Data by Mobile Phone
675 Malaysians native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. The data set can be applied for automatic speech recognition, and machine translation scenes.
502 Hours - Chinese Speaking English Speech Data by Mobile Phone
1,279 Chinese speakers from major dialect regions participated in the recording, it is in line with the specific accent of Chinese English speakers. The recorded script cover many categories such as spoken English, speech, and human-computer interaction, rich in content, extensive in fields, and balanced in phonemes. It can be used to improve the recognition effect of the automatic speech recognition system on Chinese people speaking English.
Female audio data of adults imitating children, 6599 sentences in total and 6.78 hours. It is recorded by Chinese native speakers, with authentic accent and sweet sound. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.