Keyword Spotting in Online Handwritten Documents Containing Text and Non-text Using BLSTM Neural Networks

Spotting keywords in handwritten documents without transcription is a valuable method as it allows one to search, index, and classify such documents. In this paper we show that keyword spotting based on bi-directional Long Short-Term Memory (BLSTM) recurrent neural nets can successfully be applied on online handwritten documents with non-text content. It even works without preprocessing steps such as text vs. non-text distinction and text line extraction. We also propose a modification that can improve the precision with little effort.

[1]  Anil K. Jain,et al.  Structure in on-line documents , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[2]  Horst Bunke,et al.  Text versus non-text distinction in online handwritten documents , 2010, SAC '10.

[3]  David Doermann,et al.  A New Algorithm for Detecting Text Line in Handwritten Documents , 2006 .

[4]  Mohamed Cheriet,et al.  Application of Multi-Level Classifiers and Clustering for Automatic Word Spotting in Historical Document Images , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[5]  Darren Moore,et al.  The IDIAP Smart Meeting Room , 2002 .

[6]  Edward M. Riseman,et al.  Word spotting: a new approach to indexing handwriting , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Marcus Liwicki,et al.  IAMonDo-database: an online handwritten document database with non-uniform contents , 2010, DAS '10.

[8]  Volkmar Frinken,et al.  A Novel Word Spotting Algorithm Using Bidirectional Long Short-Term Memory Neural Networks , 2010, ANNPR.

[9]  Anil K. Jain,et al.  Indexing and retrieval of on-line handwritten documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[10]  E. Ratzlaff,et al.  INTER-LINE DISTANCE ESTIMATION AND TEXT LINE EXTRACTION FOR UNCONSTRAINED ONLINE HANDWRITING , 2004 .

[11]  Heng Zhang,et al.  Keyword Spotting from Online Chinese Handwritten Documents Using One-vs-All Trained Character Classifier , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[12]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[13]  Marcus Liwicki,et al.  HMM-Based On-Line Recognition of Handwritten Whiteboard Notes , 2006 .

[14]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.