1,026 Hours - Mobile Telephony Strong Accented Mandarin Speech Data
More than 2,000 Chinese native speakers participated in the recording with equal gender. Speakers are mainly from the southern China, and some of them are from the provinces of northern China with Strong accents. The recording content is rich, covering mobile phone voice assistant interaction, smart home command and control, In-car command and control, numbers and other fields, which is accurately matching the smart home, intelligent car and other practical application scenarios.
535 Hours – German Speaking English Speech Data
1162 native German speakers recorded with authentic accent. The recorded script is designed by linguists and covers a wide domain of topics including generic command and control category; human-machine interaction category; smart home command and control category; in-car command and control category. The text is manually proofread to ensure high accuracy. It matches with main Android system phones and iPhone. The data set can be applied for automatic speech recognition, voiceprint recognition model training, construction of corpus for machine translation and algorithm research.
520 Hours – French Speaking English Speech Data
1089 French native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. The data set can be applied for automatic speech recognition, and machine translation scenes.
824 Hours-Hindi Speech Data by Mobile Phone
The data is 824 hours long and was recorded by 1,500 Indian native speakers. The accent is authentic. The recording text is designed by language experts and covers general, interactive, car, home and other categories. The text is manually proofread, and the accuracy is high. Recording devices are mainstream Android phones and iPhones. It can be applied to speech recognition, machine translation, and voiceprint recognition.
1,066 People_Face Anti-Spoofing Data
Human activeness detection data collected from indoor and outdoor, taken from 1066 people. The objects cover male and female, and age distribution ranging from juvenile to old, mainly young and middle-aged. The data includes multiple poses, multiple expressions, and multiple confrontation samples. This data set can be used for tasks such as face payment, remote ID authentication, and mobile phone face unlocking.
Human faceVarious posturesExpressionsScenesAnti-spoof sample
797 People – Young Children Chinese Speech Data
The data were recorded by 797 Chinese children aged 3 to 5, of whom 39% were children aged 5. The recording content conforms to the characteristics of children, mainly storybooks, children's songs, spoken language. Around 120 sentences for each speaker. It is simultaneously recorded by hi-fi microphone and cellphone. The vaild data are 41.8 hours. Texts are manually transcribed with high accuracy.
CUSTOMIZED COLLECTION & ANNOTATION SERVICES
1,000,000+ crowdsourcing to perform complex and professional projects