70,000 Images -English Scene with OCR Annotation
- 71,725 pictures
- 291,651 lines annotation
- 95% annotation accuracy
Datatang has passed the certification of ISO27001 Information Security Management System and ISO9001 Quality Management System.
English scenes textual image data, 71,725 pictures in total; the data is collected in life natural scenes of the United States and the United Kingdom, and the text in image has richness of multi-angle and multi-lighting; the data has been annotated by line-level, character-level and word-level, and transferred scene text content; this data set can be used OCR tasks in natural scenes.
- Characteristics of Words
- regular fonts; the number of irregular fonts should not be more than 10% of total characters.
- Number of characters
- 2-200. Text direction: Normal horizontal direction, Legibility of words: legible, readable and accurate words; can be clearly distinguished by human eyes
- Collection conditions
- Data size
- 71,725 images, 291,651 -lines annotation, 979,744 words annotation, 5,015,589 characters annotation .
- Words charactieristics
- natural scenes regular fonts, 2-200 words, legible, readable and accurate words
- Collection device
- cellphone, camera and tablet PC
- Collection place
- USA, UK , natural life scenes, indoor& outdoor.
- Shooting angle
- A． Rotation on surface between -15 degrees and +15 degrees, B． Rotation beyond surface between -15 degrees and +15 degrees, no trans-shaped words, Light distribution: natural light, controllable light (lamp light)
- Image parameters
- image format :JPG
- rectangular bounding box, character annotation, word annotation, transliterated scene texts.
- The accuracy rate is maintained over 95% in terms of the bounding box
- Excluding law-breaking images. anti-religion, anti-politics and anti-traditional custom, eroticism, violence and terror images