Fast Chinese calligraphic character recognition with large-scale data

Chinese calligraphy draws a lot of attention for its beauty and elegance. But due to the complexity of shape and styles of calligraphic characters, it is difficult for common users to recognize them. Thus it would be great if a tool is provided to help users to recognize the unknown calligraphic characters. The well-known OCR (Optical Character Recognition) technology can hardly help people to recognize the unknown characters because of their deformation and complexity. In CADAL, a Calligraphic Character Dictionary (CalliCD) which contains character images labeled with semantic meaning has been constructed and provided to common users to use online. With the help of CalliCD, user can learn more about the unknown calligraphic character by performing similarity based searching. But as with the growth of CalliCD, it takes intolerable time to do the similarity based one-to-one searching. Strategies that can handle large scale data are needed. In this paper, a fast recognition schema based on retrieval is proposed. In addition, a novel shape descriptor, called GIST-SC, is proposed to represent calligraphic character image for efficient and effective retrieval. The schema works in three steps. Firstly approximate nearest neighbors of the character image to be recognized are found quickly. Secondly, one-to-one fine matching between approximate nearest neighbors and the character image to be recognized is performed. Finally the recognition based on semantic probability is given. Our experiments show that the GIST-SC descriptor and the recognition schema are efficient and effective for Chinese calligraphic character recognition with CalliCD.

[1]  Yueting Zhuang,et al.  Retrieval of Chinese Calligraphic Character Image , 2004, PCM.

[2]  Yueting Zhuang,et al.  Efficient shape matching for Chinese calligraphic character retrieval , 2011, Journal of Zhejiang University SCIENCE C.

[3]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[4]  Daphna Weinshall,et al.  Flexible Syntactic Matching of Curves and Its Application to Automatic Hierarchical Classification of Silhouettes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[6]  Hsi-Jian Lee,et al.  Dual-binarization and anisotropic diffusion of Chinese characters in calligraphy documents , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[7]  Gregory Shakhnarovich,et al.  Learning task-specific similarity , 2005 .

[8]  Yueting Zhuang,et al.  Skeleton-Based Recognition of Chinese Calligraphic Character Image , 2008, PCM.

[9]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[10]  Ehud Rivlin,et al.  Applying algebraic and differential invariants for logo recognition , 1996 .

[11]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[12]  Abel G. Oliva,et al.  Gist of a scene , 2005 .

[13]  Jingying Chen,et al.  Noisy logo recognition using line segment Hausdorff distance , 2003, Pattern Recognit..

[14]  Daming Shi,et al.  Offline handwritten Chinese character recognition by radical decomposition , 2003, TALIP.

[15]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Venu Govindaraju,et al.  Ensemble of Biased Learners for Offline Arabic Handwriting Recognition , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[17]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[18]  Anuj Srivastava,et al.  Analysis of planar shapes using geodesic paths on shape spaces , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Peng Liu,et al.  Calligraphy Beautification Method for Chinese Handwritings , 2012, 2012 Fourth International Conference on Digital Home.

[20]  Yueting Zhuang,et al.  Web based Chinese Calligraphy Learning with 3-D Visualization Method , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[21]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[22]  Victoria J. Hodge,et al.  Identifying perceptual structures in trademark images , 2008 .

[23]  Kai Yu,et al.  Chinese calligraphy specific style rendering system , 2010, JCDL '10.

[24]  Jiangqin Wu,et al.  A Quick Search Engine for Historical Chinese Calligraphy Character Image , 2008, 2008 Congress on Image and Signal Processing.

[25]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[26]  Yunhe Pan,et al.  Automatic generation of artistic chinese calligraphy , 2004, IEEE Intelligent Systems.

[27]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).