1,026 Hours - Mobile Telephony Strong Accented Mandarin Speech Data
More than 2,000 Chinese native speakers participated in the recording with equal gender. Speakers are mainly from the southern China, and some of them are from the provinces of northern China with Strong accents. The recording content is rich, covering mobile phone voice assistant interaction, smart home command and control, In-car command and control, numbers and other fields, which is accurately matching the smart home, intelligent car and other practical application scenarios.
535 Hours – German Speaking English Speech Data
1162 native German speakers recorded with authentic accent. The recorded script is designed by linguists and covers a wide domain of topics including generic command and control category; human-machine interaction category; smart home command and control category; in-car command and control category. The text is manually proofread to ensure high accuracy. It matches with main Android system phones and iPhone. The data set can be applied for automatic speech recognition, voiceprint recognition model training, construction of corpus for machine translation and algorithm research.
520 Hours – French Speaking English Speech Data
1089 French native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. The data set can be applied for automatic speech recognition, and machine translation scenes.
824 Hours-Hindi Speech Data by Mobile Phone
The data is 824 hours long and was recorded by 1,500 Indian native speakers. The accent is authentic. The recording text is designed by language experts and covers general, interactive, car, home and other categories. The text is manually proofread, and the accuracy is high. Recording devices are mainstream Android phones and iPhones. It can be applied to speech recognition, machine translation, and voiceprint recognition.
797 People – Young Children Chinese Speech Data
The data were recorded by 797 Chinese children aged 3 to 5, of whom 39% were children aged 5. The recording content conforms to the characteristics of children, mainly storybooks, children's songs, spoken language. Around 120 sentences for each speaker. It is simultaneously recorded by hi-fi microphone and cellphone. The vaild data are 41.8 hours. Texts are manually transcribed with high accuracy.
156 Speakers - Mobile Malay Speech Data
156 Speakers - Mobile Telephony Malay Speech Data_Reading is recorded by native Malay speakers in the quiet environment. The recording is rich in content, covering multiple categories such as economy, entertainment, news, oral language, numbers, and letters. Around 450 sentences for each speaker. The effective time is 135 hours. All texts are manually transcribed to ensure high accuracy.
Malay speechMalaysiaMobile phoneReading
3,000 Hours-Chinese Children Speech data by Mobile phone
Mobile phone captured audio data of Chinese children, with total duration of 3,000 hours. 14,000 speakers are children aged 6 to 12, with accent covering seven dialect areas; the recorded text contains common children languages such as essay stories, numbers, and their interactions on cars, at home, and with voice assistants, precisely matching the actual application scenes. All sentences are manually transferred with high accuracy.
CUSTOMIZED COLLECTION & ANNOTATION SERVICES
1,000,000+ crowdsourcing to perform complex and professional projects