Pattern detection and recognition using over-complete and sparse representations

Recent research in harmonic analysis and mammalian vision systems has revealed that over-complete and sparse representations play an important role in visual information processing. The research on applying such representations to pattern recognition and detection problems has become an interesting field of study. The main contribution of this thesis is to propose two feature extraction strategies - the global strategy and the local strategy - to make use of these representations. In the global strategy, over-complete and sparse transformations are applied to the input pattern as a whole and features are extracted in the transformed domain. This strategy has been applied to the problems of rotation invariant texture classification and script identification, using the Ridgelet transform. Experimental results have shown that better performance has been achieved when compared with Gabor multi-channel filtering method and Wavelet based methods. The local strategy is divided into two stages. The first one is to analyze the local over-complete and sparse structure, where the input 2-D patterns are divided into patches and the local over-complete and sparse structure is learned from these patches using sparse approximation techniques. The second stage concerns the application of the local over-complete and sparse structure. For an object detection problem, we propose a sparsity testing technique, where a local over-complete and sparse structure is built to give sparse representations to the text patterns and non-sparse representations to other patterns. Object detection is achieved by identifying patterns that can be sparsely represented by the learned structure. This technique has been applied to detect texts in scene images with a recall rate of 75.23% (about 6% improvement compared with other works) and a precision rate of 67.64% (about 12% improvement). For applications like character or shape recognition, the learned over-complete and sparse structure is combined with a Convolutional Neural Network (CNN). A second text detection method is proposed based on such a combination to further improve (about 11% higher compared with our first method based on sparsity testing) the accuracy of text detection in scene images. Finally, this method has been applied to handwritten Farsi numeral recognition, which has obtained a 99.22% recognition rate on the CENPARMI Database and a 99.5% recognition rate on the HODA Database. Meanwhile, a SVM with gradient features achieves recognition rates of 98.98% and 99.22% on these databases respectively.

[1]  C. Garcia,et al.  Text detection and segmentation in complex color images , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[3]  Pascal Monasse,et al.  Fast computation of a contrast-invariant image representation , 2000, IEEE Trans. Image Process..

[4]  G. Strang Introduction to Linear Algebra , 1993 .

[5]  Patrick Kelly,et al.  Automatic script identification from images using cluster-based templates , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[6]  Adam Krzyzak,et al.  Rotation invariant feature extraction using Ridgelet and Fourier transforms , 2006, Pattern Analysis and Applications.

[7]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[8]  Jiang Gao,et al.  An adaptive algorithm for text detection from natural scenes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Rangasami L. Kashyap,et al.  A Model-Based Method for Rotation Invariant Texture Classification , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Tieniu Tan,et al.  Rotation Invariant Texture Features and Their Use in Automatic Script Identification , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Mohamed A. Ismail,et al.  Techniques for language identification for hybrid Arabic-English document images , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[12]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[13]  Bidyut Baran Chaudhuri,et al.  Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[14]  Xiaoming Huo,et al.  Beamlets and Multiscale Image Analysis , 2002 .

[15]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[16]  M. Dehghan,et al.  Farsi handwritten character recognition with moment invariants , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[17]  Wen Gao,et al.  Fast and robust text detection in images and video frames , 2005, Image Vis. Comput..

[18]  Meng Shi,et al.  Handwritten numeral recognition using gradient and curvature of gray scale image , 2002, Pattern Recognit..

[19]  Sang-Cheol Park,et al.  Text Locating from Natural Scene Images Using Image Intensitie , 2005, ICDAR.

[20]  C.-C. Jay Kuo,et al.  Texture analysis and classification with tree-structured wavelet transform , 1993, IEEE Trans. Image Process..

[21]  A. Kundu,et al.  Rotation and Gray Scale Transform Invariant Texture Identification using Wavelet Decomposition and Hidden Markov Model , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  N. Kingsbury Complex Wavelets for Shift Invariant Analysis and Filtering of Signals , 2001 .

[23]  Rangachar Kasturi,et al.  Locating uniform-colored text in video frames , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[24]  A. Lawrence Spitz,et al.  Determination of the Script and Language Content of Document Images , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Ching Y. Suen,et al.  Standard Databases for Recognition of Handwritten Digits, Numerical Strings, Legal Amounts, Letters and Dates in Farsi Language , 2006 .

[26]  Hyeran Byun,et al.  Scene text extraction in natural scene images using hierarchical feature combining and verification , 2004, ICPR 2004.

[27]  Bernd Freisleben,et al.  Text detection in images based on unsupervised classification of high-frequency wavelet coefficients , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[28]  Anthonio Teolis,et al.  Computational signal processing with wavelets , 1998, Applied and numerical harmonic analysis.

[29]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[30]  S Marcelja,et al.  Mathematical description of the responses of simple cortical cells. , 1980, Journal of the Optical Society of America.

[31]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[32]  Hiroaki Kobayashi,et al.  Text detection in color scene images based on unsupervised clustering of multi-channel wavelet features , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[33]  Lionel Moisan,et al.  Edge Detection by Helmholtz Principle , 2001, Journal of Mathematical Imaging and Vision.

[34]  Edward H. Adelson,et al.  Shiftable multiscale transforms , 1992, IEEE Trans. Inf. Theory.

[35]  J. Robson,et al.  Application of fourier analysis to the visibility of gratings , 1968, The Journal of physiology.

[36]  S.M. Lucas,et al.  ICDAR 2005 text locating competition results , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[37]  Sargur N. Srihari,et al.  Document Image Binarization Based on Texture Features , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[39]  David J. Field,et al.  Wavelets, vision and the statistics of natural scenes , 1999, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[40]  Adam Krzyzak,et al.  Rotation invariant pattern recognition using ridgelets, wavelet cycle-spinning and Fourier features , 2005, Pattern Recognit..

[41]  U Ranft,et al.  Random field models in the textural analysis of ultrasonic images of the liver , 1996, IEEE Trans. Medical Imaging.

[42]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[43]  Robert M. Haralick,et al.  Zone classification using texture features , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[44]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[45]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[46]  Hamid Soltanian-Zadeh,et al.  Rotation-invariant multiresolution texture analysis using Radon and wavelet transforms , 2005, IEEE Transactions on Image Processing.

[47]  Henry S. Baird,et al.  Language identification in Complex, Unoriented, and Degraded Document Images , 1996, DAS.

[48]  Hiroshi Sako,et al.  Class-specific feature polynomial classifier for pattern classification and its application to handwritten numeral recognition , 2006, Pattern Recognit..

[49]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1993 .

[50]  Rémi Gribonval,et al.  Learning unions of orthonormal bases with thresholded singular value decomposition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[51]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[53]  Xilin Chen,et al.  Automatic detection and recognition of signs from natural scenes , 2004, IEEE Transactions on Image Processing.

[54]  Minh N. Do,et al.  Rotation invariant texture characterization and retrieval using steerable wavelet-domain hidden Markov models , 2002, IEEE Trans. Multim..

[55]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[56]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[58]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[59]  Jean-Michel Morel,et al.  Topographic Maps and Local Contrast Changes in Natural Images , 1999, International Journal of Computer Vision.

[60]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[61]  Sridha Sridharan,et al.  Texture for script identification , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Chi-Man Pun,et al.  Log-Polar Wavelet Energy Signatures for Rotation and Scale Invariant Texture Classification , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  Emmanuel J. Candès,et al.  The curvelet transform for image denoising , 2002, IEEE Trans. Image Process..

[64]  R. J. Green,et al.  Recognition of Handwritten Cursive Arabic Characters , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Ching Y. Suen,et al.  Script identification using steerable Gabor filters , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[66]  Charles K. Chui,et al.  An Introduction to Wavelets , 1992 .

[67]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[68]  Karim Faez,et al.  Recognition of handwritten Persian/Arabic numerals by shadow coding and an edited probabilistic neural network , 1995, Proceedings., International Conference on Image Processing.

[69]  Donald A. Adjeroh,et al.  Efficient texture analysis of SAR imagery , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[70]  Karim Faez,et al.  Recognition of isolated handwritten Persian/Arabic characters and numerals using support vector machines , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[71]  Anil K. Jain,et al.  Texture classification and segmentation using multiresolution simultaneous autoregressive models , 1992, Pattern Recognit..

[72]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  M. Do Directional multiresolution image representations , 2002 .

[74]  R. Porter,et al.  Robust rotation-invariant texture classification: wavelet, Gabor filter and GMRF based schemes , 1997 .

[75]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[77]  Cheng-Lin Liu,et al.  Handwritten digit recognition: benchmarking of state-of-the-art techniques , 2003, Pattern Recognit..

[78]  Wumo Pan,et al.  Rotation invariant texture classification by ridgelet transform and frequency-orientation space decomposition , 2008, Signal Process..

[79]  Dimitrios Charalampidis,et al.  Wavelet-based rotational invariant roughness features for texture classification and segmentation , 2002, IEEE Trans. Image Process..

[80]  Hiroshi Sako,et al.  Handwritten digit recognition: investigation of normalization and feature extraction techniques , 2004, Pattern Recognit..

[81]  S. Mallat A wavelet tour of signal processing , 1998 .

[82]  Sebastiano Impedovo,et al.  Automatic Bankcheck Processing: A New Engineered System , 1997, Int. J. Pattern Recognit. Artif. Intell..

[83]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[84]  Trygve Randen,et al.  Filtering for Texture Classification: A Comparative Study , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[85]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[86]  Tieniu Tan,et al.  Brief review of invariant texture analysis methods , 2002, Pattern Recognit..

[87]  Minh N. Do,et al.  Multidimensional Directional Filter Banks and Surfacelets , 2007, IEEE Transactions on Image Processing.

[88]  Tieniu Tan,et al.  Extraction of noise robust rotation invariant texture features via multichannel filtering , 1997, Proceedings of International Conference on Image Processing.

[89]  Ching Y. Suen,et al.  A novel cascade ensemble classifier system with a high recognition performance on handwritten digits , 2007, Pattern Recognit..

[90]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[91]  Jie Ding,et al.  Classification of oriental and European scripts by using characteristic features , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[92]  Jean Serra,et al.  Image Analysis and Mathematical Morphology , 1983 .

[93]  Anil K. Jain,et al.  Page segmentation using tecture analysis , 1996, Pattern Recognit..

[94]  E. Candès,et al.  Curvelets: A Surprisingly Effective Nonadaptive Representation for Objects with Edges , 2000 .

[95]  Fumitaka Kimura,et al.  Modified Quadratic Discriminant Functions and the Application to Chinese Character Recognition , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[96]  Sargur N. Srihari,et al.  Integration of hand-written address interpretation technology into the United States Postal Service Remote Computer Reader system , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[97]  Mark R. Stevens,et al.  Automatic feature selection with applications to script identification of degraded documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[98]  E. Candès,et al.  Continuous curvelet transform: II. Discretization and frames , 2005 .

[99]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[100]  Satoshi Goto,et al.  A robust algorithm for text detection in color images , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[101]  Anil K. Jain,et al.  Feature extraction methods for character recognition-A survey , 1996, Pattern Recognit..

[102]  Ehsanollah Kabir,et al.  Introducing a very large dataset of handwritten Farsi digits and a study on their varieties , 2007, Pattern Recognit. Lett..

[103]  Kjersti Engan,et al.  Multi-frame compression: theory and design , 2000, Signal Process..

[104]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[105]  Hermann Ney,et al.  Deformation Models for Image Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[106]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[107]  Hamid Soltanian-Zadeh,et al.  Radon transform orientation estimation for rotation invariant texture analysis , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[108]  Ching Y. Suen,et al.  Text detection from scene images using sparse representation , 2008, 2008 19th International Conference on Pattern Recognition.

[109]  E. Candès,et al.  Continuous curvelet transform , 2003 .

[110]  Bidyut Baran Chaudhuri,et al.  Automatic separation of words in multi-lingual multi-script Indian documents , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[111]  Ching Y. Suen,et al.  Sorting and Recognizing Cheques and Financial Documents , 1998, Document Analysis Systems.

[112]  Jin Hyung Kim,et al.  Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[113]  Ching Y. Suen,et al.  A trainable feature extractor for handwritten digit recognition , 2007, Pattern Recognit..

[114]  Michael Unser,et al.  Texture classification and segmentation using wavelet frames , 1995, IEEE Trans. Image Process..

[115]  F. S. Cohen,et al.  Classification of Rotated and Scaled Textured Images Using Gaussian Markov Random Field Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[116]  Gabor T. Herman,et al.  Image reconstruction from projections : the fundamentals of computerized tomography , 1980 .

[117]  Ching Y. Suen,et al.  Rotation-Invariant Texture Classification Using Steerable Gabor Filter Bank , 2005, ICIAR.

[118]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[119]  B. S. Manjunath,et al.  Rotation-invariant texture classification using a complete space-frequency model , 1999, IEEE Trans. Image Process..

[120]  Mandyam D. Srinath,et al.  Orthogonal Moment Features for Use With Parametric and Non-Parametric Classifiers , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[121]  Wen-Rong Wu,et al.  Correction To "rotation And Gray-scale Transform-invariant Texture Classification Using Spiral Resampling, Subband Decomposition, And Hidden Markov Model" , 1996, IEEE Trans. Image Process..