Neural Word Search in Historical Manuscript Collections

We address the problem of segmenting and retrieving word images in collections of historical manuscripts given a text query. This is commonly referred to as "word spotting". To this end, we first p ...

[1]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Anders Brun,et al.  Semantic and Verbatim Word Spotting Using Deep Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[3]  John Tosh,et al.  The Pursuit of History: Aims, Methods and New Directions in the Study of Modern History , 1999 .

[4]  Natalie Zemon Davis,et al.  The Return of Martin Guerre , 1983 .

[5]  Gernot A. Fink,et al.  Bag-of-Features HMMs for Segmentation-Free Word Spotting in Handwritten Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[6]  Lior Wolf,et al.  CNN-N-Gram for HandwritingWord Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Albert Gordo,et al.  Label Embedding: A Frugal Baseline for Text Recognition , 2015, International Journal of Computer Vision.

[8]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[9]  C. V. Jawahar,et al.  Deep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[10]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[11]  Josep Lladós,et al.  Integrating Visual and Textual Cues for Query-by-String Word Spotting , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[12]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[14]  Frank Lebourgeois,et al.  Text search for medieval manuscript images , 2007, Pattern Recognit..

[15]  Anders Brun,et al.  Data mining medieval documents by word spotting , 2011, HIP '11.

[16]  Ernest Valveny,et al.  A Sliding Window Framework for Word Spotting Based on Word Attributes , 2015, IbPRIA.

[17]  Anders Brun,et al.  Semantic and Verbatim Word Spotting Using Deep Neural Networks , 2016, ICFHR 2016.

[18]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  C. V. Jawahar,et al.  Word Spotting and Recognition Using Deep Embedding , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[20]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[21]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[22]  G. Duncan,et al.  THE PURSUIT OF HISTORY: AIMS, METHODS AND NEW DIRECTIONS IN THE STUDY OF HISTORY. SIXTH EDITION. , 2017 .

[23]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[24]  Edward M. Riseman,et al.  Word spotting: a new approach to indexing handwriting , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Christopher Kermorvant,et al.  Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[26]  Joshua Alspector,et al.  A Line-Oriented Approach to Word Spotting in Handwritten Documents , 2000, Pattern Analysis & Applications.

[27]  Gernot A. Fink,et al.  Word Hypotheses for Segmentation-Free Word Spotting in Historic Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[28]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Jérôme Louradour,et al.  Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention , 2016, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[31]  José A. Rodríguez-Serrano,et al.  Handwritten word-spotting using hidden Markov models and universal vocabularies , 2009, Pattern Recognit..

[32]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[33]  Chunhua Shen,et al.  Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ernest Valveny,et al.  Segmentation-free word spotting with exemplar SVMs , 2014, Pattern Recognit..

[36]  Erik Lindberg,et al.  Making verbs count: the research project ‘Gender and Work’ and its methodology , 2011 .

[37]  Núria Cirera,et al.  BH2M: The Barcelona Historical, Handwritten Marriages Database , 2014, 2014 22nd International Conference on Pattern Recognition.

[38]  Tim Causer,et al.  Building A Volunteer Community: Results and Findings from Transcribe Bentham , 2012, Digit. Humanit. Q..

[39]  Anders Brun,et al.  A Novel Word Segmentation Method Based on Object Detection and Deep Learning , 2015, ISVC.

[40]  R. Manmatha,et al.  Word image matching using dynamic time warping , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[41]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[42]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[43]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[44]  Konstantinos Zagoris,et al.  ICFHR2016 Handwritten Keyword Spotting Competition (H-KWS 2016) , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[45]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Andrew S. I. D. Lang,et al.  Using Amazon Mechanical Turk to Transcribe Historical Handwritten Documents , 2011 .

[47]  Jiri Matas,et al.  Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  Gernot A. Fink,et al.  Evaluating Word String Embeddings and Loss Functions for CNN-Based Word Spotting , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[49]  William J. Turkel,et al.  The Old Bailey Proceedings, 1674–1913: Text Mining for Evidence of Court Behavior , 2016, Law and History Review.

[50]  Emmanuel Le Roy Ladurie,et al.  Montaillou, Cathars and Catholics in a French village, 1294-1324 , 1978 .

[51]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[53]  Eva Pettersson,et al.  HistSearch - Implementation and Evaluation of a Web-based Tool for Automatic Information Extraction from Historical Text , 2016, HistoInformatics@DH.

[54]  C. V. Jawahar,et al.  Matching Handwritten Document Images , 2016, ECCV.

[55]  Lior Wolf,et al.  A Simple and Fast Word Spotting Method , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[56]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Lior Wolf,et al.  Toward a Dataset-Agnostic Word Segmentation Method , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[58]  Anders Brun,et al.  Neural Ctrl-F: Segmentation-Free Query-by-String Word Spotting in Handwritten Manuscript Collections , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Gernot A. Fink,et al.  PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[60]  Gernot A. Fink,et al.  Attribute CNNs for word spotting in handwritten documents , 2017, International Journal on Document Analysis and Recognition (IJDAR).

[61]  Christian Wolf,et al.  Learning to detect, localize and recognize many text objects in document images from few examples , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[62]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[63]  Ernest Valveny,et al.  Text box proposals for handwritten word spotting from documents , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[64]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[65]  François Chollet Information-theoretical label embeddings for large-scale image classification , 2016, ArXiv.

[66]  Basilios Gatos,et al.  A survey of document image word spotting techniques , 2017, Pattern Recognit..

[67]  Jiri Matas,et al.  Real-Time Lexicon-Free Scene Text Localization and Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[69]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[70]  Carlo Ginzburg,et al.  The Cheese and the Worms: The Cosmos of a Sixteenth Century Miller , 1982 .

[71]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[72]  R. Manmatha,et al.  Holistic word recognition for handwritten historical documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[73]  Josep Lladós,et al.  Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method , 2011, 2011 International Conference on Document Analysis and Recognition.

[74]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[75]  Volkmar Frinken,et al.  A Novel Word Spotting Method Based on Recurrent Neural Networks , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[77]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78]  C. V. Jawahar,et al.  HWNet v2: an efficient word image representation for handwritten documents , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[79]  Ernest Valveny,et al.  Query by string word spotting based on character bi-gram indexing , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[80]  Gernot A. Fink,et al.  Segmentation-free query-by-string word spotting with Bag-of-Features HMMs , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).