A deep learning approach to handwritten text recognition in the presence of struck-out text

The accuracy of handwritten text recognition may be affected by the presence of struck-out text in the handwritten manuscript. This paper investigates and improves the performance of a widely used handwritten text recognition approach Convolutional Recurrent Neural Network (CRNN) on handwritten lines containing struck out words. For this purpose, some common types of struck-out strokes were superimposed on words in a text line. A model, trained on the IAM line database was tested on lines containing struck-out words. The Character Error Rate (CER) increased from 0.09 to 0.11. This model was re-trained on dataset containing struck-out text. The model performed well in terms of struck-out text detection. We found that after providing an adequate number of training examples, the model can deal with learning struck-out patterns in a way that does not affect the overall recognition accuracy.

[1]  Hermann Ney,et al.  Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[2]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Bidyut Baran Chaudhuri,et al.  An approach for detecting and cleaning of struck-out handwritten text , 2017, Pattern Recognit..

[4]  Joan Puigcerver,et al.  Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition? , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[5]  Juan C. Elizondo-Leal,et al.  An Exact Euclidean Distance Transform for Universal Path Planning , 2010, 2010 IEEE Electronics, Robotics and Automotive Mechanics Conference.

[6]  Laurence Likforman-Sulem,et al.  HMM-based Offline Recognition of Handwritten Words Crossed Out with Different Kinds of Strokes , 2008 .

[7]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[8]  R. Manmatha,et al.  Scale Space Technique for Word Segmentation in Handwritten Documents , 1999, Scale-Space.

[9]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[10]  Bidyut Baran Chaudhuri,et al.  An Approach of Strike-Through Text Identification from Handwritten Documents , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[11]  Pablo M. Olmos,et al.  Boosting Handwriting Text Recognition in Small Databases with Transfer Learning , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[12]  Bidyut Baran Chaudhuri,et al.  Impact of struck-out text on writer identification , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[13]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[14]  Lambert Schomaker,et al.  Automatic removal of crossed-out handwritten text and the effect on writer verification and identification , 2008, Electronic Imaging.