
1,535 Hours - Mixed Speech with Chinese and English Data by Mobile Phone
- 3972 people
- seven main dialect zones
- sentences with Chinese and English
Datatang has passed the certification of ISO27001 Information Security Management System and ISO9001 Quality Management System.


Data Introduction
The data is recorded by 3972 Chinese native speakers with accents covering seven major dialect areas. The recorded text is a mixture of Chinese and English sentences, covering general scenes and human-computer interaction scenes. It is rich in content and accurate in transcription. It can be used for improving the recognition effect of the speech recognition system on Chinese-English mixed reading speech.
Data Specification
- Format
- 16kHz, 16bit, uncompressed wav, mono channel
- Recording environment
- quiet indoor environment, without echo
- Recording content (read speech)
- general category; human-machine interaction category
- Demographics
- 3,972 speakers totally, with 43% males and 57% females, and 68% speakers of all are in the age group of 12-25, 31% speakers of all in the age group of 26-45, 1% speakers of all are in the age group of 46-60
- Device
- Android mobile phone, iPhone;
- Language
- mandarin; English
- Application scenarios
- speech recognition; voiceprint recognition.
Sample
-
00:00/00:00
讲座最后一个分支是worm的形成
-
00:00/00:00
定位@NIKEA总部位置
-
00:00/00:00
[N]新年第一天来了个reject扎心啊
-
00:00/00:00
切换嘻哈风格的Code of Honor听。[N]
-
00:00/00:00
[S]话说airmail就要洋气得多[N]