A Touching Character Database from Chinese Handwriting for Assessing Segmentation Algorithms

For assessing touching character segmentation algorithms, we present a database of touching characters collected from the Chinese handwriting database CASIA-HWDB, called CASIA-HWDB-T. It includes 56,469 two-character or multiple-character touching strings, among which 1,818 strings have multiple-touching characters. We also partition the touching strings into 50,157 all-Chinese strings, 2,788 all-digit ones, 328 all-letter ones, and 3,196 mixed-character ones. All the strings are annotated with the character classes, locations of touching points, and auxiliary values like string height and average stroke width. And last, we measure the segmentation performance of three existing algorithms on this database for reference.

[1]  Shuyan Zhao,et al.  Two-stage segmentation of unconstrained handwritten Chinese character , 2003, Pattern Recognit..

[2]  Cheng-Lin Liu,et al.  Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Pengfei Shi,et al.  A metasynthetic approach for segmenting handwritten Chinese character strings , 2005, Pattern Recognit. Lett..

[4]  Fei Yin,et al.  Handwritten Chinese Text Recognition by Integrating Multiple Contexts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Tianwen Zhang,et al.  Off-line recognition of realistic Chinese handwriting using segmentation-free strategy , 2009, Pattern Recognit..

[6]  Malayappan Shridhar,et al.  A segmentation system for touching handwritten Japanese characters , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[7]  Liang Xu,et al.  Touching Character Separation in Chinese Handwriting Using Visibility-Based Foreground Analysis , 2011, 2011 International Conference on Document Analysis and Recognition.

[8]  Yukio Ogawa,et al.  A recognition method for touching Japanese handwritten characters , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[9]  Liang Xu,et al.  Touching Character Splitting of Chinese Handwriting Using Contour Analysis and DTW , 2010, 2010 Chinese Conference on Pattern Recognition (CCPR).

[10]  Tianwen Zhang,et al.  Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[11]  Misako Suwa Segmentation of touching handwritten Japanese characters using the graph theory method , 2000, IS&T/SPIE Electronic Imaging.

[12]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Fei Yin,et al.  CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.

[14]  Xiaoqing Ding,et al.  A segmentation algorithm for handwritten Chinese character strings , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[15]  Luiz Eduardo Soares de Oliveira,et al.  A synthetic database to assess segmentation algorithms , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).