Features for printed document image analysis

This paper presents features for text/non-text area separation in printed document images. First, it introduces entropic discrimination, i.e., a simple separation using only one feature. Then, a brief recall on existing texture and geometric discriminant parameters proposed in previous research (2001, 2002) is included. Several of them are statistically examined.

[1]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[2]  Kuo-Chin Fan,et al.  Segmentation and classification of mixed text/graphics/image documents , 1994, Pattern Recognit. Lett..

[3]  Robert M. Haralick,et al.  Document image understanding: geometric and logical layout , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Kuo-Chin Fan,et al.  Classification of document blocks using density feature and connectivity histogram , 1995, Pattern Recognit. Lett..

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  Anil K. Jain,et al.  Document Representation and Its Application to Page Decomposition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Matti Pietikäinen,et al.  Page Segmentation and Zone Classification: The State of the Art , 1999 .

[8]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Ching Y. Suen,et al.  KMOD - a new support vector machine kernel with moderate decreasing for pattern recognition. Application to digit image recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[12]  Ching Y. Suen,et al.  Extraction of text areas in printed document images , 2001, DocEng '01.

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.