Menu Close

Datatang adds 5,000 Traditional Chinese characters to OCR system

Beijing-based artificial intelligence (AI) company Datatang has updated its optical character recognition (OCR) database to include 5,000 handwritten characters in Traditional Chinese.

In a dedicated webpage for the new set, Datatang said the characters were collected by various samples written on A4 paper, square paper, and lined paper, among others.

By adding the characters to its software suite, Datatang enables customers to use OCR of the corresponding traditional Chinese characters when encountering them in the wild. In other words, by scanning a text through a smartphone and the Datatang app, users will now be able to automate data entry and filling out forms.

OCR is sometimes implemented for document scanning in digital identity verification and onboarding applications.

According to the company, the error bound of each vertex of the quadrilateral bounding box around each character is within five pixels, for a qualified annotation. The accuracy of bounding boxes and text transcription accuracy are both reportedly not less than 97 percent.

The addition of the new dataset comes months after Datatang executives said their speech recognition datasets were created with native language speakers and surpassed the industry’s standards. 

More recently, the company showcased its synthetic data generation technology at the 2022 Conference on Computer Vision and Pattern Recognition (CVPR 2022). Read More

Generated by Feedzy


Innov8 is owned and operated by Rolling Rock Ventures. The information on this website is for general information purposes only. Any information obtained from this website should be reviewed with appropriate parties if there is any concern about the details reported herein. Innov8 is not responsible for its contents, accuracies, and any inaccuracies. Nothing on this site should be construed as professional advice for any individual or situation. This website includes information and content from external sites that is attributed accordingly and is not the intellectual property of Innov8. All feeds ("RSS Feed") and/or their contents contain material which is derived in whole or in part from material supplied by third parties and is protected by national and international copyright and trademark laws. The Site processes all information automatically using automated software without any human intervention or screening. Therefore, the Site is not responsible for any (part) of this content. The copyright of the feeds', including pictures and graphics, and its content belongs to its author or publisher.  Views and statements expressed in the content do not necessarily reflect those of Innov8 or its staff. Care and due diligence has been taken to maintain the accuracy of the information provided on this website. However, neither Innov8 nor the owners, attorneys, management, editorial team or any writers or employees are responsible for its content, errors or any consequences arising from use of the information provided on this website. The Site may modify, suspend, or discontinue any aspect of the RSS Feed at any time, including, without limitation, the availability of any Site content.  The User agrees that all RSS Feeds and news articles are for personal use only and that the User may not resell, lease, license, assign, redistribute or otherwise transfer any portion of the RSS Feed without attribution to the Site and to its originating author. The Site does not represent or warrant that every action taken with regard to your account and related activities in connection with the RSS Feed, including, without limitation, the Site Content, will be lawful in any particular jurisdiction. It is incumbent upon the user to know the laws that pertain to you in your jurisdiction and act lawfully at all times when using the RSS Feed, including, without limitation, the Site Content.  

Close Bitnami banner