The value of the call center depends on people's demand for service quality. As a hub connecting enterprises and customers, the call center is an important platform to provide services such as information consultation, after-sales, and complaints. Ensuring the quality of customer service is not only conducive to improving the relationship with customers, but also establishing a good corporate image and maintaining corporate credibility. Therefore, in the customer service industry, quality inspection is very important.With the development of computer technology, the existing customer service system has been upgraded. In the call center empowered by AI, intelligent customer service has become a trend.

However, the complexity of language poses a challenge to artificial intelligence speech recognition technology. In many situations, people will talk in a very natural way. Dialects, personal speaking habits, and environment have a great influence on the effect of voice data.How to identify and filter some sensitive words in dialects has become an urgent technical problem for intelligent speech quality inspection.

Speech recognition ability + natural language understanding ability are the basic technology of speech recognition. "Mastering" a variety of dialects is an important guarantee for intelligent customer service to improve the quality of dialogue during interaction and realize the "influence-free" of robots.The root cause of the deviation in speech recognition lies in the data. If you can record a common dialect sentence through pinyin and phonetics, and integrate a database for artificial intelligence learning, it will greatly improve the accuracy of language recognition. The higher the number and diversity of speech samples in the corpus, the more accurate the model obtained.

Datatang has designed and produced natural dialogue voice data in Cantonese and Henan dialect and Sichuan dialect pronunciation dictionary for the scenes of natural dialogue in various dialects. Datatang strictly abides by the terms of GDPR and has passed ISO27001 information security management system and ISO9001 quality management system certification. The collected voice data has been authorized by the person being collected.

607 Hours - Cantonese Conversational Speech Data by Mobile Phone and Voice Recorder

995 local Cantonese speakers participated in the recording, and conducted face-to-face communication in a natural way. They had free discussion on a number of given topics, with a wide range of fields; the voice was natural and fluent, in line with the actual dialogue scene. Text is transferred manually, with high accuracy.

60,000 SiChuan Dialect Pronunciation Dictionary

This pronunciation dictionary collects words with dialect characteristics in Sichuan. Each entry consists of three parts: words, pinyin and tones. The dictionary can be used to provide pronunciation reference for sound recording personnel, research and development of pronunciation recognition technology, etc.

463 Speakers-HENAN Dialect Speech Data by Mobile phone

It collects 463 Henan locals with authentic accent. The recording contents contain daily message and multi-fields customer consultation. It is checked and proofread by Henan locals to ensure the high accuracy. It recorded by android cellphone and iPhone.

Through the learning of a large amount of sound data and its speech patterns, artificial intelligence can better understand different dialects and improve recognition capabilities. By collecting dialects that are currently lacking in the speech recognition industry, the Datatang series of data sets aims to improve the accuracy of customer speech recognition technology.


