Pho(SC)Net: An Approach Towards Zero-shot Word Image Recognition in Historical Documents

Annotating words in a historical document image archive for word image recognition purpose demands time and skilled human resource (like historians, paleographers). In a real-life scenario, obtaining sample images for all possible words is also not feasible. However, Zero-shot learning methods could aptly be used to recognize unseen/outof-lexicon words in such historical document images. Based on previous state-of-the-art methods for word spotting and recognition, we propose a hybrid representation that considers the character’s shape appearance to differentiate between two different words and has shown to be more effective in recognizing unseen words. This representation has been termed as Pyramidal Histogram of Shapes (PHOS), derived from PHOC, which embeds information about the occurrence and position of characters in the word. Later, the two representations are combined and experiments were conducted to examine the effectiveness of an embedding that has properties of both PHOS and PHOC. Encouraging results were obtained on two publicly available historical document datasets and one synthetic handwritten dataset, which justifies the efficacy of “Phos” and the combined “Pho(SC)” representation.

[1]  Soma Biswas,et al.  Preserving Semantic Relations for Zero-Shot Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Lambert Schomaker,et al.  Zero-Shot Learning Based Approach For Medieval Word Recognition using Deep-Learned Features , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[3]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[4]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[5]  Bernt Schiele,et al.  Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xiaobo Jin,et al.  Attentive Region Embedding Network for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Narayanan C. Krishnan,et al.  Semantically Aligned Bias Reducing Zero Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Alicia Fornés,et al.  Convolve, Attend and Spell: An Attention-based Sequence-to-Sequence Model for Handwritten Word Recognition , 2018, GCPR.

[10]  Gernot A. Fink,et al.  PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[11]  Hongguang Zhang,et al.  Zero-Shot Kernel Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Anders Brun,et al.  Neural Ctrl-F: Segmentation-Free Query-by-String Word Spotting in Handwritten Manuscript Collections , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Mauricio Villegas,et al.  TreyNet: A Neural Model for Text Localization, Transcription and Named Entity Recognition in Full Pages , 2020, Pattern Recognit. Lett..

[14]  Gernot A. Fink,et al.  Annotation-free Learning of Deep Representations for Word Spotting using Synthetic Data and Self Labeling , 2020, DAS.

[15]  Yun Fu,et al.  Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  C. V. Jawahar,et al.  HWNet v2: an efficient word image representation for handwritten documents , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[17]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[18]  C. V. Jawahar,et al.  Word Spotting and Recognition Using Deep Embedding , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[19]  Alejandro Héctor Toselli,et al.  Preparatory KWS Experiments for Large-Scale Indexing of a Vast Medieval Manuscript Collection in the HIMANIS Project , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[20]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Gernot A. Fink,et al.  Evaluating Word String Embeddings and Loss Functions for CNN-Based Word Spotting , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[22]  C. V. Jawahar,et al.  Bringing semantics into word image representation , 2020, Pattern Recognit..

[23]  Kaiqi Huang,et al.  Discriminative Learning of Latent Features for Zero-Shot Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  C. V. Jawahar,et al.  Improving CNN-RNN Hybrid Networks for Handwriting Recognition , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[25]  Ashok Veeraraghavan,et al.  Webly Supervised Learning Meets Zero-shot Learning: A Hybrid Approach for Fine-Grained Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.