A Blended Attention-CTC Network Architecture for Amharic Text-image Recognition

In this paper, we propose a blended Attention-Connectionist Temporal Classification (CTC) network architecture for a unique script, Amharic, text-image recognition. Amharic is an indigenous Ethiopic script that uses 34 consonant characters with their 7 vowel variants of each and 50 labialized characters which are derived, with a small change, from the 34 consonant characters. The change involves modifying the structure of these characters by adding a straight line, or shortening and/or elongating one of its main legs including the addition of small diacritics to the right, left, top or bottom of the character. Such a small change affects orthographic identities of character and results in shape similarly among characters which are interesting, but challenging task, for OCR research. Motivated with the recent success of attention mechanism on neural machine translation tasks, we propose an attention-based CTC approach which is designed by blending attention mechanism directly within the CTC network. The proposed model consists of an encoder module, attention module and transcription module in a unified framework. The efficacy of the proposed model on the Amharic language shows that attention mechanism allows learning powerful representations by integrating information from different time steps. Our method outperforms state-of-the-art methods and achieves 1.04% and 0.93% of the character error rate on ADOCR test datasets.

[1]  Christof Monz,et al.  What does Attention in Neural Machine Translation Pay Attention to? , 2017, IJCNLP.

[2]  Jason Poulos,et al.  Character-based handwritten text transcription with attention networks , 2017, Neural Computing and Applications.

[3]  Marcus Liwicki,et al.  Factored Convolutional Neural Network for Amharic Character Image Recognition , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[4]  Lianwen Jin,et al.  Improving Attention-Based Handwritten Mathematical Expression Recognition with Scale Augmentation and Drop Attention , 2020, 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[5]  Lianwen Jin,et al.  Attention After Attention: Reading Text in the Wild with Cross Attention , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[6]  Ladislav Lenc,et al.  Building an efficient OCR system for historical documents with little training data , 2020, Neural Computing and Applications.

[7]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[8]  Wenyi Huang,et al.  Detecting Arbitrary Oriented Text in the Wild with a Visual Attention Model , 2016, ACM Multimedia.

[9]  Lars Schmidt-Thieme,et al.  Handwritten Amharic Character Recognition Using a Convolutional Neural Network , 2019, ArXiv.

[10]  Thomas M. Breuel,et al.  High-Performance OCR for Printed English and Fraktur Using LSTM Networks , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Jürgen Schmidhuber,et al.  Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks , 2007, NIPS.

[13]  Lovekesh Vig,et al.  An Efficient End-to-End Neural Model for Handwritten Text Recognition , 2018, BMVC.

[14]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[15]  Simon Osindero,et al.  Recursive Recurrent Nets with Attention Modeling for OCR in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Marcus Liwicki,et al.  Amharic Text Image Recognition: Database, Algorithm, and Analysis , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[17]  Jun Du,et al.  Track, Attend, and Parse (TAP): An End-to-End Framework for Online Handwritten Mathematical Expression Recognition , 2019, IEEE Transactions on Multimedia.

[18]  C. V. Jawahar,et al.  Optical Character Recognition of Amharic Documents , 2007, Afr. J. Inf. Commun. Technol..

[19]  Kha Cong Nguyen,et al.  Deep Convolutional Recurrent Network for Segmentation-Free Offline Handwritten Japanese Text Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[20]  Yaregal Assabie,et al.  HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[21]  Chuan-Ming Liu,et al.  Printed Ethiopic Script Recognition by Using LSTM Networks , 2018, 2018 International Conference on System Science and Engineering (ICSSE).

[22]  Anaïs Wion The National Archives and Library of Ethiopia: six years of Ethio-French cooperation (2001-2006) , 2006 .

[23]  Yifan Gong,et al.  Advancing Acoustic-to-Word CTC Model With Attention and Mixed-Units , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Fiaz Hussain,et al.  Amharic character recognition using a fast signature based algorithm , 2003, Proceedings on Seventh International Conference on Information Visualization, 2003. IV 2003..

[25]  Jérôme Louradour,et al.  Segmentation-free handwritten Chinese text recognition with LSTM-RNN , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[26]  Paramita Chattopadhyay,et al.  Automatic number plate recognition using CNN based self synthesized feature learning , 2017, 2017 IEEE Calcutta Conference (CALCON).

[27]  John R. Hershey,et al.  Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.

[28]  Thomas M. Breuel,et al.  High Performance Text Recognition Using a Hybrid Convolutional-LSTM Implementation , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[29]  Yi-Chao Wu,et al.  Handwritten Chinese Text Recognition Using Separable Multi-Dimensional Recurrent Neural Network , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[30]  Betselot Yewulu Reta,et al.  Amharic Handwritten Character Recognition Using Combined Features and Support Vector Machine , 2018, 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI).

[31]  Didier Stricker,et al.  Amharic Character Image Recognition , 2018, 2018 IEEE 18th International Conference on Communication Technology (ICCT).

[32]  Ujjwal Bhattacharya,et al.  CNN based common approach to handwritten character recognition of multiple scripts , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).