
222,289 Images – Chinese OCR Data in Natural Scenes
- 222,289 images
- Line-level annotation
- Word-level annotation
Datatang has passed the certification of ISO27001 Information Security Management System and ISO9001 Quality Management System.


Data Introduction
222,289 Images – Chinese OCR Data in Natural Scenes. The collecting scenes of this dataset include indoor and outdoor scenes.The data diversity includes multiple scenes, and multiple shooting angles. For annotation, line-level annotation, word-level annotation, character-level annotation and text transcription were adopted for the images. The dataset can be used for OCR tasks in natural scenes.n
Data Specification
- Data size
- 222,289 images
- Collecting environment
- including indoor and outdoor scenes
- Data diversity
- multiple scenes, multiple shooting angles
- Device
- cellphone, camera
- Shooting angle
- looking up angle, looking down angle, eye-level angle
- Data format
- the image data formats are jpg, png and jpeg, the annotation file format is .json
- Annotation content
- line-level rectangular bounding box annotation and transcription for the texts; word-level rectangular bounding box annotation and transcription for the texts; character-level rectangular bounding box annotation and transcription for the texts