
245 Hours – Mandarin Speech Data in Cars by Mobile Phone
- 695 participants
- 300 sentences for each person
- 16kHz, 16 bit, wav
Datatang has passed the certification of ISO27001 Information Security Management System and ISO9001 Quality Management System.


Data Introduction
695 Chinese native speakers participated in the recording, with 245 hours of valid data, covering many regions of the country. The recording was carried out in the car environment, covering various scenarios such as different road types, different vehicle models, window opening and closing situations, whether music was turned on or not, etc.
Data Specification
- Format
- 16kHz, 16bit, uncompressed wav, mono channel
- Recording Environment
- In-car
- Recording Content
- customer consulting (covered 30 fields); text message; news
- Speaker
- 695 people, 53% of which are female
- Device
- Android mobile phone
- Language
- Mandarin
- Transcription content
- text, noise symbols
- Accuracy rate
- 95% (the accuracy rate of noise symbols is not included)
- Application scenarios
- speech recognition, voiceprint recognition
Sample
-
00:00/00:00
不好笑下一个。我要听黄家驹海阔天空
-
00:00/00:00
好了。睡你的觉。我上班
-
00:00/00:00
胡萝卜汁能提高人的食欲和对感染的抵抗力
-
00:00/00:00
你帮我打一个电话吧
-
00:00/00:00
明天山西长治地区什么天气