Online handwritten script recognition

Automatic identification of handwritten script facilitates many important applications such as automatic transcription of multilingual documents and search for documents on the Web containing a particular script. The increase in usage of handheld devices which accept handwritten input has created a growing demand for algorithms that can efficiently analyze and retrieve handwritten data. This paper proposes a method to classify words and lines in an online handwritten document into one of the six major scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew, or Roman. The classification is based on 11 different spatial and temporal features extracted from the strokes of the words. The proposed system attains an overall classification accuracy of 87.1 percent at the word level with 5-fold cross validation on a data set containing 13,379 words. The classification accuracy improves to 95 percent as the number of words in the test sample is increased to five, and to 95.5 percent for complete text lines consisting of an average of seven words.

[1]  Simeon Potter,et al.  Sign, symbol and script: An account of man's efforts to write , 1970 .

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Akira Nakanishi,et al.  Writing Systems of the World , 1980 .

[4]  Anil K. Jain,et al.  Page segmentation using tecture analysis , 1996, Pattern Recognit..

[5]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Jay J. Lee,et al.  A Unified Network-based Approach for Online Recognition of Multi-Lingual Cursive Handwritings , 1997 .

[7]  A. Lawrence Spitz,et al.  Determination of the Script and Language Content of Document Images , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Tieniu Tan,et al.  Script and Language Identification from Document Images , 1997, BMVC.

[9]  Patrick Kelly,et al.  Automatic Script Identification From Document Images Using Cluster-Based Templates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  William Bright,et al.  The Blackwell encyclopedia of writing systems By Florian Coulmas (review) , 2015 .

[11]  Ching Y. Suen,et al.  Categorizing Document Images into Script and Language Classes , 1999 .

[12]  Bidyut Baran Chaudhuri,et al.  Script line separation from Indian multi-script documents , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[13]  Patrick Kelly,et al.  Script and language identification for handwritten document images , 1999, International Journal on Document Analysis and Recognition.

[14]  Anil K. Jain,et al.  Unsupervised selection and estimation of finite mixture models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[15]  Chew Lim Tan,et al.  Language Identification in Multilingual Documents , 2003 .

[16]  E. Ratzlaff,et al.  INTER-LINE DISTANCE ESTIMATION AND TEXT LINE EXTRACTION FOR UNCONSTRAINED ONLINE HANDWRITING , 2004 .