Ancient Text Character Recognition Using Deep Learning

Ancient scripts provide a captivating insight into the knowledge of ancestors which needs to be preserved for future generations. Therefore, there is a need to convert the digital script available in degraded format into textual format. To accomplish this model is being proposed in the paper that comprises of binarization using selection encoder decoder techniques. The results indicate the binarization accuracy as 74.24% approximately and F-measure is 75% (approximately) which comes out to be greater than other previously developed model. The binarized images are being further segmented using Seam Carbel method at character level and are manually compared with the vocabulary, the segmentation accuracy (As) comes out to be 70% approximately. Further, characters are recognized using a three layer Convolutional Neural Network and the recognition accuracy (Ar) is found to be 73% approximately, the recognized images are further converted into text using one to one mapping, to be further used for translation into universally acceptable language like English.

[1]  Jean-Christophe Burie,et al.  The Handwritten Sundanese Palm Leaf Manuscript Dataset from 15th Century , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[2]  Setiawan Hadi,et al.  Quality enhancement of degraded sundanese lontar images using direct subtraction and retrospective correction methods in several color space , 2017, 2017 Second International Conference on Informatics and Computing (ICIC).

[3]  Made Windu Antara Kesiman,et al.  An analysis of ground truth binarized image variability of palm leaf manuscripts , 2015, 2015 International Conference on Image Processing Theory, Tools and Applications (IPTA).

[4]  Bo Xu,et al.  Image character recognition using deep convolutional neural network learned from different languages , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[5]  Sophea Chhun,et al.  Text Recognition on Khmer Historical Documents using Glyph Class Map Generation with Encoder-Decoder Model , 2019, ICPRAM.

[6]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[7]  Malay Kishore Dutta,et al.  An adaptive threshold based image processing technique for improved glaucoma detection and classification , 2015, Comput. Methods Programs Biomed..

[8]  Isye Arieshanti,et al.  Enriching English into Sundanese and Javanese translation list using pivot language , 2016, 2016 International Conference on Information & Communication Technology and Systems (ICTS).

[9]  Binbin Yu An improved infrared image processing method based on adaptive threshold denoising , 2019, EURASIP J. Image Video Process..

[10]  Made Windu Antara Kesiman,et al.  Southeast Asian palm leaf manuscript images: a review of handwritten text line segmentation methods and new challenges , 2017, J. Electronic Imaging.

[11]  Yun Fu,et al.  Feature Selection Guided Auto-Encoder , 2017, AAAI.

[12]  Setiawan Hadi,et al.  A novel scheme for handwritten binarization method on sundanese palm leaf document images , 2017, 2017 Second International Conference on Informatics and Computing (ICIC).

[13]  Sophea Chhun,et al.  A New Khmer Palm Leaf Manuscript Dataset for Document Analysis and Recognition: SleukRith Set , 2017, HIP@ICDAR.

[14]  R. L. Jyothi,et al.  Grantha script recognition from ancient palm leaves using histogram of orientation shape context , 2017, 2017 International Conference on Computing Methodologies and Communication (ICCMC).

[15]  Made Sudarma,et al.  Balinese Script’s Character Reconstruction Using Linear Discriminant Analysis , 2016 .

[16]  Daniel Morariu,et al.  Khmer character recognition using artificial neural network , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[17]  E. Paulus,et al.  An Initial Study to Solve Imbalance Sundanese Handwritten Dataset in Character Recognition , 2018, 2018 Third International Conference on Informatics and Computing (ICIC).

[18]  Sophea Chhun,et al.  Benchmarking of Document Image Analysis Tasks for Palm Leaf Manuscripts from Southeast Asia , 2018, J. Imaging.

[19]  Akshay P. Vartak,et al.  Morphological Image Segmentation Analysis , 2013 .

[20]  Setiawan Hadi,et al.  Improved Line Segmentation Framework for Sundanese Old Manuscripts , 2018 .

[21]  Jorge Calvo-Zaragoza,et al.  A selectional auto-encoder approach for document image binarization , 2017, Pattern Recognit..

[22]  Muhammad Imran Razzak,et al.  Balinese Character Recognition Using Bidirectional LSTM Classifier , 2016, ICML 2016.

[23]  M. H. Mohamed Dyla,et al.  Text line segmentation and binarization of handwritten historical documents using the fast and adaptive bidimensional empirical mode decomposition , 2019 .

[24]  Setiawan Hadi,et al.  Binarization and Segmentation Framework for Sundanese Ancient Documents , 2017 .

[25]  Nguonly Taing,et al.  Khmer word segmentation based on Bi-directional Maximal Matching for Plaintext and Microsoft Word document , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.