Cleaning of Online Bangla Free-form Handwritten Text

In the normal free-form handwritten text, repetition (repeated writing of the same stroke several times in the same place), over-writing, and crossing out are very common. In this article, we call the presence of these three types of writing as “noise.” Cleaning to extract useful text from such types of noisy text is an important task for robust recognition. To the best of our knowledge, no work has been reported on cleaning of such noise from online text in any scripts and hence, in this article, we propose an automatic text-cleaning approach for online handwriting recognition. Here, at first, crossing out noise with straight strike-through lines is detected using the straightness criteria of online strokes. Next, regions containing repetition, over-writing, and other types of crossing out are located using the positional information of the overlapping strokes. Stroke density, self-intersections of strokes etc. are computed from the strokes of located regions to predict the type of noise and this type of information is used as follows for their cleaning. For cleaning of crossing outs, all strokes of the crossing-out region are removed. For cleaning repetition and over-writing, strokes written earlier are removed, keeping the latest strokes. Finally, delayed strokes are properly arranged and word is passed to online recognizer. Though recognition of free-form handwriting is quite difficult, in this attempt, we obtained up to 70.71% improvement in word-recognition accuracy after noise cleaning.

[1]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2]  Masaki Nakagawa,et al.  A Line-Direction-Free and Character-Orientation-Free On-Line Handwritten Japanese Text Recognition System , 2016, IEICE Trans. Inf. Syst..

[3]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[4]  Umapada Pal,et al.  Overwriting repetition and crossing-out detection in online handwritten text , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[5]  Masaki Nakagawa,et al.  A robust method for coarse classifier construction from a large number of basic recognizers for on-line handwritten Chinese/Japanese character recognition , 2014, Pattern Recognit..

[6]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[7]  Sriganesh Madhvanath,et al.  HMM-Based Lexicon-Driven and Lexicon-Free Word Recognition for Online Handwritten Indic Scripts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Bidyut Baran Chaudhuri,et al.  An Approach of Strike-Through Text Identification from Handwritten Documents , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[9]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[10]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .

[11]  Fumitaka Kimura,et al.  A System for Bangla Online Handwritten Text , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[12]  Umapada Pal,et al.  Stroke Segmentation and Recognition from Bangla Online Handwritten Text , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[13]  Alexander H. Waibel,et al.  Online handwriting recognition: the NPen++ recognizer , 2001, International Journal on Document Analysis and Recognition.

[14]  Laurence Likforman-Sulem,et al.  HMM-based Offline Recognition of Handwritten Words Crossed Out with Different Kinds of Strokes , 2008 .

[15]  Venu Govindaraju,et al.  Generation and use of handwritten CAPTCHAs , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[16]  Umapada Pal,et al.  Improved BLSTM Neural Networks for Recognition of On-Line Bangla Complex Words , 2014, S+SSPR.

[17]  Fumitaka Kimura,et al.  Document recognition strategies for bank cheques , 2009, 2009 IEEE International Conference on Electro/Information Technology.

[18]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  S. K. Parui,et al.  An Analytic Scheme for Online Handwritten Bangla Cursive Word Recognition , 2008 .

[20]  Fumitaka Kimura,et al.  Comprehensive Check Image Reader , 2007, 2007 International Conference on Computing: Theory and Applications (ICCTA'07).

[21]  Xin Wang,et al.  Parsing ink annotations on heterogeneous documents , 2006, SBM'06.

[22]  Xin Wang,et al.  Ink Annotations and their Anchoring in Heterogeneous Digital Documents , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[23]  Umapada Pal,et al.  Design of Unsupervised Feature Extraction System for On-line Bangla Handwriting Recognition , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[24]  Bidyut Baran Chaudhuri,et al.  An approach for detecting and cleaning of struck-out handwritten text , 2017, Pattern Recognit..