Predictive coding for document layout characterization

We propose a new approach to document image layout extraction using rapid feature analysis, preclassification and predictive coding. First, a set of layout features is used to render the image profile information. The knowledge base is utilized to rule these early regions into layout labels. The regions found are given a classification tag and a degree of membership into background, text, picture and line drawing classes. A predictive coding method is used with the preclassification information to increase the confidence of each label, and to integrate the regional domain and the labels into a uniform class without any shape assumption. We have tested our technique using three different databases that comprise over 1000 document images. The results show a high degree of confidence in region separation and extraction. The main benefits include robust classification shape independency and rapid computation.

[1]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[2]  Matti Pietikäinen,et al.  Page segmentation and classification using fast feature extraction and connectivity analysis , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[3]  T. Pavlidis,et al.  Page segmentation without rectangle assumption , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[4]  Azriel Rosenfeld,et al.  The Development of a General Framework for Intelligent Document Image Retrieval , 1996, DAS.

[5]  Omid E. Kia,et al.  Hyperdocument management for compression, transmission and processing , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[6]  Anil K. Jain,et al.  Page segmentation using tecture analysis , 1996, Pattern Recognit..

[7]  Tim Ritchings,et al.  Representation and classification of complex-shaped printed regions using white tiles , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[8]  Donato Malerba,et al.  An experimental page layout recognition system for office document automatic classification: an integrated approach for inductive generalization , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[9]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Jiangying Zhou,et al.  Page segmentation and classification , 1992, CVGIP Graph. Model. Image Process..