Features for neural net based region identification of newspaper documents

Several features for neural network based document region identification are tested. Specifically, this paper examines features for non-text region identification. The neural network based region identification algorithm is a key component of a document recognition system that segments a document into regions, classifies them into text, graphic, photo, and other region types, and then uses this classification to guide the processing and analysis of the image. The input data are unusually challenging: low quality images of newspaper documents obtained from microfilmed archives. The results compare favorably with other results reported in the literature.

[1]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Seong-Whan Lee,et al.  Parameter-independent geometric document layout analysis , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[3]  Anil K. Jain,et al.  Page segmentation using texture discrimination masks , 1995, Proceedings., International Conference on Image Processing.

[4]  Robert M. Haralick,et al.  Zone classification using texture features , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[5]  Hong Yan,et al.  Document page segmentation based on pattern spread analysis , 2000 .

[6]  Véronique Eglin,et al.  Printed text featuring using the visual criteria of legibility and complexity , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[7]  Véronique Eglin,et al.  Visual exploration and functional document labeling , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[8]  Thrasyvoulos N. Pappas,et al.  A robust and efficient algorithm for bilevel document block classification , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[9]  T. John Stonham,et al.  Document segmentation using texture analysis , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[10]  P. S. Williams,et al.  Generic texture analysis applied to newspaper segmentation , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[11]  Rama Chellappa,et al.  Page segmentation using decision integration and wavelet packets , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[12]  Proceedings Seventh International Conference on Document Analysis and Recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[13]  Devesh Patel,et al.  Page segmentation for document image analysis using a neural network , 1996 .

[14]  Hong Yan,et al.  Newspaper document analysis featuring connected line segmentation , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[15]  Gerd Maderlechner,et al.  Extraction of relevant information from document images using measures of visual attention , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[16]  Axel Pinz,et al.  Layout and analysis: Finding text, titles, and photos in digital images of newspaper pages , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).