Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text

A Chinese handwriting database named HIT-MW is presented to facilitate the offline Chinese handwritten text recognition. Both the writers and the texts for handcopying are carefully sampled with a systematic scheme. To collect naturally written handwriting, forms are distributed by postal mail or middleman instead of face to face. The current version of HIT-MW includes 853 forms and 186,444 characters that are produced under an unconstrained condition without preprinted character boxes. The statistics show that the database has an excellent representation of the real handwriting. Many new applications concerning real handwriting recognition can be supported by the database.

[1]  Nikos Fakotakis,et al.  The GRUHD database of Greek unconstrained handwriting , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[2]  Yiping Chen,et al.  Effects of Word Form on Brain Processing of Written Chinese , 2002, NeuroImage.

[3]  Yuan Yan Tang,et al.  Offline Recognition of Chinese Handwriting by Multifeature and Multilevel Classification , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Daehwan Kim,et al.  Handwritten Korean character image database PE92 , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[5]  Wilbur H. Highleyman,et al.  An Analog Method for Character Recognition , 1961, IRE Trans. Electron. Comput..

[6]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[7]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Mounim A. El-Yacoubi,et al.  A Statistical Approach for Phrase Location and Recognition within a Text Line: An Application to Street Name Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  J. H. Munson,et al.  Experiments in the recognition of hand-printed text, part I: character recognition , 1968, AFIPS '68 (Fall, part II).

[10]  Ching Y. Suen,et al.  Analysis and recognition of Asian scripts-the state of the art , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[11]  Miguel Toro,et al.  Evolutionary learning of hierarchical decision rules , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[12]  Horst Bunke,et al.  A full English sentence database for off-line handwriting recognition , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[13]  Cheng-Lin Liu,et al.  Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Samy Bengio,et al.  Offline recognition of unconstrained handwritten texts using HMMs and statistical language models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[16]  Bidyut Baran Chaudhuri,et al.  Databases for research on recognition of handwritten characters of Indian scripts , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[17]  Stefan Knerr,et al.  The IRESTE On/Off (IRONOFF) dual handwriting database , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[18]  Gyeonghwan Kim,et al.  An architecture for handwritten text recognition systems , 1999, International Journal on Document Analysis and Recognition.

[19]  Anthony J. Robinson,et al.  An Off-Line Cursive Handwriting Recognition System , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Ching Y. Suen,et al.  Computer recognition of unconstrained handwritten numerals , 1992, Proc. IEEE.

[22]  Ching Y. Suen,et al.  Recognition of legal amounts on bank cheques , 2005, Pattern Analysis and Applications.

[23]  Qiang Huo,et al.  A comparative study of several modeling approaches for large vocabulary offline recognition of handwritten Chinese characters , 2002, Object recognition supported by user interaction for service robots.

[24]  M. Berthod,et al.  Automatic recognition of handprinted characters—The state of the art , 1980, Proceedings of the IEEE.

[25]  Pengfei Shi,et al.  A metasynthetic approach for segmenting handwritten Chinese character strings , 2005, Pattern Recognit. Lett..

[26]  Tonghua Su,et al.  HIT-MW Dataset for Offline Chinese Handwritten Text Recognition , 2006 .

[27]  Seong-Whan Lee,et al.  Automatic quality measurement of gray-scale handwriting based on extended average entropy , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[28]  Kenneth M. Sayre,et al.  Machine recognition of handwritten words: A project report , 1973, Pattern Recognit..

[29]  Horst Bunke,et al.  TV-gram language models for offline handwritten text recognition , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.