Detection and Classification of Interesting Parts in Scanned Documents by Means of AdaBoost Classification and Low-Level Features Verification

This paper presents a novel approach to detection and identification of selected document's parts stamps, logos, printed text blocks, signatures and tables on digital images obtained through paper document scanning. This task is realized in two main steps. The first one includes element detection, which is done by means of AdaBoost cascade of weak classifiers. Resulting image blocks are, in the second step, subjected to verification process. Eight feature vectors based on recently proposed descriptors were selected and combined with six different classifiers that represent numerous approaches to the task of data classification. Experiments performed on large set of paper document images gathered from Internet gave encouraging results.

[1]  Marcus Liwicki,et al.  Signature Segmentation from Document Images , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[2]  Daniel P. Lopresti,et al.  Evaluating the performance of table processing algorithms , 2002, International Journal on Document Analysis and Recognition.

[3]  Matti Pietikäinen,et al.  Page segmentation and classification using fast feature extraction and connectivity analysis , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[4]  Thomas M. Breuel,et al.  Document image zone classification - a simple high-performance approach , 2007, VISAPP.

[5]  Leen-Kiat Soh,et al.  Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices , 1999, IEEE Trans. Geosci. Remote. Sens..

[6]  Robert M. Haralick,et al.  Extraction of text lines and text blocks on document images based on statistical modeling , 1996 .

[7]  Matti Pietikäinen,et al.  Gray Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2000, ECCV.

[8]  David A Clausi An analysis of co-occurrence texture statistics as a function of grey level quantization , 2002 .

[9]  Paweł Forczmański,et al.  Stamps Detection and Classification Using Simple Features Ensemble , 2015 .

[10]  Pawel Forczmanski,et al.  Robust Stamps Detection and Classification by Means of General Shape Analysis , 2010, ICCVG.

[11]  Xiaoou Tang,et al.  Texture information in run-length matrices , 1998, IEEE Trans. Image Process..

[12]  Pawel Forczmanski,et al.  Low-Level Image Features for Stamps Detection and Classification , 2013, CORES.

[13]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[15]  Belur V. Dasarathy,et al.  Image characterizations based on joint gray level-run length distributions , 1991, Pattern Recognit. Lett..

[16]  Domingo Mery,et al.  Face Recognition with Local Binary Patterns, Spatial Pyramid Histograms and Naive Bayes Nearest Neighbor Classification , 2009, 2009 International Conference of the Chilean Computer Science Society.

[17]  Mary M. Galloway,et al.  Texture analysis using gray level run lengths , 1974 .

[18]  Anil K. Jain,et al.  Page segmentation using tecture analysis , 1996, Pattern Recognit..

[19]  Qifeng Liu,et al.  A stroke filter and its application to text localization , 2009, Pattern Recognit. Lett..

[20]  Yalin Wang,et al.  Document zone content classification and its performance evaluation , 2006, Pattern Recognit..

[21]  Wen Gao,et al.  Fast and effective text detection , 2008, 2008 15th IEEE International Conference on Image Processing.

[22]  Robert M. Hodgson,et al.  Texture Measures for Carpet Wear Assessment , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Qifeng Liu,et al.  Stroke Filter for Text Localization in Video Images , 2006, 2006 International Conference on Image Processing.

[24]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Pawel Forczmanski,et al.  General Shape Analysis Applied to Stamps Retrieval from Scanned Documents , 2010, AIMSA.

[26]  Anil K. Jain,et al.  Automatic Caption Localization in Compressed Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Krzysztof Okarma,et al.  Fast Histogram Based Image Binarization Using the Monte Carlo Threshold Estimation , 2014, ICCVG.

[28]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[29]  Matti Pietikäinen,et al.  Edge-based method for text detection from complex document images , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[30]  Jules-Raymond Tapamo,et al.  A texture-based method for document segmentation and classification , 2006, South Afr. Comput. J..

[31]  Adam Marchewka,et al.  Extraction of Data from Limnigraf Chart Images , 2013, IP&C.

[32]  David S. Doermann,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Signature Detection and Matching , 2022 .

[33]  Robert Burduk,et al.  The AdaBoost Algorithm with the Imprecision Determine the Weights of the Observations , 2014, ACIIDS.

[34]  Matti Pietikäinen,et al.  Page Segmentation and Zone Classification: The State of the Art , 1999 .