Statistical script independent word spotting in offline handwritten documents

We propose a statistical script independent line based word spotting framework for offline handwritten documents based on Hidden Markov Models. We propose and compare an exhaustive study of filler models and background models for better representation of background or non-keyword text. The candidate keywords are pruned in a two stage spotting framework using the character based and lexicon based background models. The system deals with large vocabulary without the need for word or character segmentation. The script independent word spotting system is evaluated on a mixed corpus of public dataset from several scripts such as IAM for English, AMA for Arabic and LAW for Devanagari. HighlightsLine model allows one or more occurrences of keywords surrounded by filler models.We investigate five different filler models. CFMs provide the best results.The optimum number of character filler models varies for different scripts.Two background models are investigated to prune candidate keyword regions.Proposed framework is quicker and more accurate than current state of the art.

[1]  Úúò Blockin Off-Line Cursive Script Recognition Based on Continuous Density HMM , 2000 .

[2]  Isabelle Guyon,et al.  On-line cursive script recognition using time-delay neural networks and hidden Markov models , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Mounim A. El-Yacoubi,et al.  A Statistical Approach for Phrase Location and Recognition within a Text Line: An Application to Street Name Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Ch. Choisy Dynamic Handwritten Keyword Spotting Based on the NSHP-HMM , 2007 .

[5]  Gernot A. Fink,et al.  Markov Models for Pattern Recognition , 2014, Advances in Computer Vision and Pattern Recognition.

[6]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[7]  Matthew Brand,et al.  Discovery and Segmentation of Activities in Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Volkmar Frinken,et al.  Adapting BLSTM Neural Network Based Keyword Spotting Trained on Modern Data to Historical Documents , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[9]  Josep Lladós,et al.  Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method , 2011, 2011 International Conference on Document Analysis and Recognition.

[10]  José A. Rodríguez-Serrano,et al.  Handwritten word-spotting using hidden Markov models and universal vocabularies , 2009, Pattern Recognit..

[11]  Hong Yan,et al.  Skew Correction of Document Images Using Interline Cross-Correlation , 1993, CVGIP Graph. Model. Image Process..

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  Jihad El-Sana,et al.  Keyword Searching for Arabic Handwritten Documents , 2008 .

[14]  David A. Forsyth,et al.  Searching Off-line Arabic Documents , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  David S. Doermann,et al.  The Indexing and Retrieval of Document Images: A Survey , 1998, Comput. Vis. Image Underst..

[16]  Gernot A. Fink,et al.  Markov Models for Pattern Recognition: From Theory to Applications , 2007 .

[17]  Mohamed Cheriet,et al.  IBN SINA: a database for research on processing and understanding of Arabic manuscripts images , 2010, DAS '10.

[18]  Venu Govindaraju,et al.  Script Independent Word Spotting in Offline Handwritten Documents Based on Hidden Markov Models , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[19]  Andreas Keller,et al.  HMM-based Word Spotting in Handwritten Documents Using Subword Models , 2010, 2010 20th International Conference on Pattern Recognition.

[20]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Johansson. Stig,et al.  Manual of information to accompany the Lancaster-Oslo : Bergen Corpus of British English, for use with digital computers , 1978 .

[22]  Alessandro Vinciarelli,et al.  A survey on off-line Cursive Word Recognition , 2002, Pattern Recognit..

[23]  Volkmar Frinken,et al.  A Novel Word Spotting Method Based on Recurrent Neural Networks , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[25]  PlamondonRéjean,et al.  On-Line and Off-Line Handwriting Recognition , 2000 .

[26]  R. Manmatha,et al.  Word spotting for historical documents , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[27]  Umapada Pal,et al.  Database Development and Recognition of Handwritten Devanagari Legal Amount Words , 2011, 2011 International Conference on Document Analysis and Recognition.

[28]  Simon Thomas,et al.  An Information Extraction Model for Unconstrained Handwritten Documents , 2010, 2010 20th International Conference on Pattern Recognition.

[29]  Yee Whye Teh,et al.  Making Latin Manuscripts Searchable using gHMMs , 2004, NIPS.

[30]  Edward M. Riseman,et al.  Word spotting: a new approach to indexing handwriting , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Horst Bunke,et al.  Recognition of cursive Roman handwriting: past, present and future , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[32]  Rohit Prasad,et al.  Improvements in BBN's HMM-Based Offline Arabic Handwriting Recognition System , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[33]  Venu Govindaraju,et al.  2009 10th International Conference on Document Analysis and Recognition A Steerable Directional Local Profile Technique for Extraction of Handwritten Arabic Text Lines , 2022 .

[34]  Geetha Srikantan,et al.  A multiple feature/resolution approach to handprinted digit and character recognition , 1996, Int. J. Imaging Syst. Technol..

[35]  Sargur N. Srihari,et al.  Spotting Words in Latin , Devanagari and Arabic Scripts , 2006 .

[36]  Frank Lebourgeois,et al.  Towards an omnilingual word retrieval system for ancient manuscripts , 2009, Pattern Recognit..