Artistic Multi-character Script Identification Using Iterative Isotropic Dilation Algorithm

In this work, a new problem of script identification named artistic multi-character script identification has been addressed. Two types of datasets of artistic documents/images prepared with Bangla, Devanagari and Roman script have been used: one is real life artistic multi-character script image and another is synthetic artistic multi-character script image. After binarization using Otsu’s algorithm, some character images found to be broken into components. To overcome this, a novel iterative isotropic dilation algorithm is proposed here to convert the components into a single component object. Then two types of features, namely shape based and texture based features have been considered. Discrete Gabor wavelet has been exploited with 2 scales and 4 orientations for texture feature extraction and PCA is used to reduce the dimensionality of the texture feature space. The performance of the proposed algorithm has been tested with different machine learning classifiers and promising accuracy has been observed.

[1]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[2]  Nibaran Das,et al.  Handwritten Indic Script Identification in Multi-Script Document Images: A Survey , 2018, Int. J. Pattern Recognit. Artif. Intell..

[3]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[4]  A. G. Ramakrishnan,et al.  Word level multi-script identification , 2008, Pattern Recognit. Lett..

[5]  Kpalma Kidiyo,et al.  A Survey of Shape Feature Extraction Techniques , 2008 .

[6]  Ching Y. Suen,et al.  A fast parallel algorithm for thinning digital patterns , 1984, CACM.

[7]  A. G. Ramakrishnan,et al.  Script identification in printed bilingual documents , 2002, Document Analysis Systems.

[8]  Nibaran Das,et al.  Word-Level Multi-Script Indic Document Image Dataset and Baseline Results on Script Identification , 2017, Int. J. Comput. Vis. Image Process..

[9]  Umapada Pal,et al.  Multi-oriented Bangla and Devnagari text recognition , 2010, Pattern Recognit..

[10]  Laurent Wendling,et al.  Dtw-Radon-Based Shape Descriptor for Pattern Recognition , 2013, Int. J. Pattern Recognit. Artif. Intell..

[11]  Prakash K. Aithal,et al.  Text line script identification for a tri-lingual document , 2010, 2010 Second International conference on Computing, Communication and Networking Technologies.

[12]  Nibaran Das,et al.  PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification , 2017, Multimedia Tools and Applications.

[13]  Nibaran Das,et al.  Automatic Indic script identification from handwritten documents: page, block, line and word-level approach , 2019, Int. J. Mach. Learn. Cybern..

[14]  Laurent Wendling,et al.  DTW for Matching Radon Features: A Pattern Recognition and Retrieval Method , 2011, ACIVS.

[15]  B. S. Manjunath,et al.  Texture features and learning similarity , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Guojun Lu,et al.  Content-based Image Retrieval Using Gabor Texture Features , 2000 .

[17]  William A. Pearlman,et al.  Steganalysis of additive-noise modelable information hiding , 2003, IS&T/SPIE Electronic Imaging.

[18]  Hanan Samet,et al.  Efficient Component Labeling of Images of Arbitrary Dimension Represented by Linear Bintrees , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Laurent Wendling,et al.  Character recognition based on non-linear multi-projection profiles measure , 2015, Frontiers of Computer Science.

[20]  Javier Portillo,et al.  Breadth-first search and its application to image processing problems , 2001, IEEE Trans. Image Process..

[21]  Sanghamitra Mohanty,et al.  An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents , 2009, 2009 Seventh International Conference on Advances in Pattern Recognition.

[22]  Ahmet Alkan,et al.  Automatic seizure detection in EEG using logistic regression and artificial neural network , 2005, Journal of Neuroscience Methods.

[23]  K. C. Santosh Character Recognition Based on DTW-Radon , 2011, 2011 International Conference on Document Analysis and Recognition.

[24]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[25]  Nibaran Das,et al.  Extreme learning machine for handwritten Indic script identification in multiscript documents , 2018, J. Electronic Imaging.

[26]  Nibaran Das,et al.  AUTOMATIC LINE-LEVEL SCRIPT IDENTIFICATION FROM HANDWRITTEN DOCUMENT IMAGES - A REGION-WISE CLASSIFICATION FRAMEWORK FOR INDIAN SUBCONTINENT , 2018 .

[27]  Debashis Ghosh,et al.  Script Recognition—A Review , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  A. G. Ramakrishnan,et al.  Optimal Feature Extraction for Bilingual OCR , 2002, Document Analysis Systems.

[29]  U. Pal,et al.  Multi-script line identification from Indian documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[30]  Prakash K. Aithal,et al.  Script identification for a Tri-lingual document , 2011 .

[31]  David G. Kirkpatrick,et al.  Linear Time Euclidean Distance Algorithms , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Mahmut Ozer,et al.  EEG signals classification using the K-means clustering and a multilayer perceptron neural network model , 2011, Expert Syst. Appl..

[33]  Ibrahima Faye,et al.  Analysis of mammogram images based on texture features of curvelet sub-bands , 2014, International Conference on Graphic and Image Processing.

[34]  Mario Chica-Olmo,et al.  An assessment of the effectiveness of a random forest classifier for land-cover classification , 2012 .

[35]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  J. Franke,et al.  A comparison of two approaches for combining the votes of cooperating classifiers , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[37]  A.G. Ramakrishnan,et al.  Gabor filters for document analysis in Indian bilingual documents , 2004, International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of.

[38]  Chun-Jen Chen,et al.  A linear-time component-labeling algorithm using contour tracing technique , 2004, Comput. Vis. Image Underst..

[39]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[40]  Nibaran Das,et al.  Bangla and Oriya Script Lines Identification from Handwritten Document Images in Tri-script Scenario , 2016, Int. J. Serv. Sci. Manag. Eng. Technol..