
794 Hours - Sichuan Dialect Speech Data by Mobile Phone
- 2,507 people
- quiet, indoor
- 16kHz, 16bit, wav
Datatang has passed the certification of ISO27001 Information Security Management System and ISO9001 Quality Management System.


Data Introduction
It collects 2,507 speakers from Sichuan Basin and is recorded in quiet indoor environment. The recorded content covers customer consultation and text messages in many fields. The average number of repetitions is 1.3 and the average sentence length is 12.5 words. Sichuan natives participate in quality inspection and proofreading to ensure the accuracy of the text transcription.
Data Specification
- Format
- 16kHz, 16bit, uncompressed wav, mono channel
- Recording environment
- quiet indoor environment, without echo
- Recording content (read speech)
- generic category, human-machine interaction category
- Speaker
- 2,507 people, 1,749 females, accounting for 70%
- Device
- Android mobile phone and iPhone
- Language
- Sichuan dialect
- Transcription content
- text, 4 noise symbols
- Application scenarios
- speech recognition, voiceprint recognition
Sample
-
00:00/00:00
记录了他们在城市夹缝里生存和成长的青春故事
-
00:00/00:00
我会第一时间告诉你的噻
-
00:00/00:00
<STA>你晓得女孩儿都在想啥子吗
-
00:00/00:00
一辆黑色广本车的挡风玻璃被电动车电瓶葬咾个洞
-
00:00/00:00
内蒙古的省会在哪儿咹