Zero Shot Learning Based Script Identification in the Wild

The text recognition system for natural images or video frames containing multilingual text needs a method to first identify the written script and then recognize the word in the identified script. However, the occurrence of some scripts is rare as compared to others. Due to the availability of a few samples of the rare script, the supervised learning of the deep neural networks is difficult. To overcome this problem, we have proposed a zero-shot learning based method for script identification. We have also proposed architecture for script identification which fuses the global feature vector and the semantic embedding vector. The semantic embedding of the script is obtained by using the spatial dependency of the stroke's sequence via the recurrent neural network. The proposed architecture shows superior results as compared to the baseline approaches.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Dimosthenis Karatzas,et al.  A Fine-Grained Approach to Scene Text Script Identification , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[3]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Xiang Bai,et al.  Scene text script identification with Convolutional Recurrent Neural Networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[5]  Wafa Khlif,et al.  ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[6]  Umapada Pal,et al.  Word-Wise Script Identification from Video Frames , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[7]  Dimosthenis Karatzas,et al.  Improving patch-based scene text script identification with ensembles of conjoined networks , 2016, Pattern Recognit..

[8]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[9]  Partha Pratim Roy,et al.  Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network , 2018, Pattern Recognit..

[10]  C. V. Jawahar,et al.  Can RNNs reliably separate script and language at word and line level? , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[11]  Feiyue Huang,et al.  Automatic script identification in the wild , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[12]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Palaiahnakote Shivakumara,et al.  New Gradient-Spatial-Structural Features for video script identification , 2015, Comput. Vis. Image Underst..

[17]  C. V. Jawahar,et al.  A Simple and Effective Solution for Script Identification in the Wild , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[18]  Chunyan Miao,et al.  A Survey of Zero-Shot Learning , 2019, ACM Trans. Intell. Syst. Technol..

[19]  Xiang Bai,et al.  Script identification in the wild via discriminative convolutional neural network , 2016, Pattern Recognit..

[20]  Umapada Pal,et al.  ICDAR2015 Competition on Video Script Identification (CVSI 2015) , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).