Document-zone classification using partial least squares and hybrid classifiers

This paper introduces a novel document-zone classification algorithm. Low level image features are first extracted from document zones and partial least squares is used on pairs of classes to compute discriminating pairwise features. Rather than using the popular one-against-all and one-against-one voting schemes, we introduce a novel hybrid method which combines the benefits of the two schemes. The algorithm is applied on the University of Washington dataset and 97.3% classification accuracy is obtained.

[1]  Yalin Wang,et al.  Document zone content classification and its performance evaluation , 2006, Pattern Recognit..

[2]  Daniel P. Lopresti,et al.  Evaluating the performance of table processing algorithms , 2002, International Journal on Document Analysis and Recognition.

[3]  Jingying Chen,et al.  Noisy logo recognition using line segment Hausdorff distance , 2003, Pattern Recognit..

[4]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Donato Malerba,et al.  Classification in Noisy Environments Using a Distance Measure Between Structural Symbolic Descriptions , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[7]  Hong Yan,et al.  Text region extraction in a document image based on the Delaunay tessellation , 2003, Pattern Recognit..

[8]  Richard Zanibbi,et al.  Recognizing Mathematical Expressions Using Tree Transformation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Francesca Cesarini,et al.  Encoding of modified X-Y trees for document classification , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[10]  Motoi Iwata,et al.  Segmentation of Page Images Using the Area Voronoi Diagram , 1998, Comput. Vis. Image Underst..