Online handwritten script recognition

Automatic identification of handwritten script facilitates many important applications such as automatic transcription of multilingual documents and search for documents on the Web containing a particular script. The increase in usage of handheld devices which accept handwritten input has created a growing demand for algorithms that can efficiently analyze and retrieve handwritten data. This paper proposes a method to classify words and lines in an online handwritten document into one of the six major scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew, or Roman. The classification is based on 11 different spatial and temporal features extracted from the strokes of the words. The proposed system attains an overall classification accuracy of 87.1 percent at the word level with 5-fold cross validation on a data set containing 13,379 words. The classification accuracy improves to 95 percent as the number of words in the test sample is increased to five, and to 95.5 percent for complete text lines consisting of an average of seven words.

[1]  Patrick Kelly,et al.  Automatic Script Identification From Document Images Using Cluster-Based Templates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Patrick Kelly,et al.  Script and language identification for handwritten document images , 1999, International Journal on Document Analysis and Recognition.

[3]  E. Ratzlaff,et al.  INTER-LINE DISTANCE ESTIMATION AND TEXT LINE EXTRACTION FOR UNCONSTRAINED ONLINE HANDWRITING , 2004 .

[4]  Anil K. Jain,et al.  Unsupervised selection and estimation of finite mixture models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[5]  Bidyut Baran Chaudhuri,et al.  Script line separation from Indian multi-script documents , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[6]  Anil K. Jain,et al.  Page segmentation using tecture analysis , 1996, Pattern Recognit..

[7]  A. Lawrence Spitz,et al.  Determination of the Script and Language Content of Document Images , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Simeon Potter,et al.  Sign, symbol and script: An account of man's efforts to write , 1970 .

[9]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Jay J. Lee,et al.  A Unified Network-based Approach for Online Recognition of Multi-Lingual Cursive Handwritings , 1997 .

[11]  Ching Y. Suen,et al.  Categorizing Document Images into Script and Language Classes , 1999 .

[12]  Akira Nakanishi,et al.  Writing Systems of the World , 1980 .

[13]  Tieniu Tan,et al.  Script and Language Identification from Document Images , 1997, BMVC.

[14]  Patrick Kelly,et al.  Automatic script identification from images using cluster-based templates , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[15]  Mário A. T. Figueiredo On Gaussian radial basis function approximations: interpretation, extensions, and learning strategies , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.