DAN: An Automatic Segmentation and Classification Engine for Paper Documents

The paper documents recognition is fundamental for office automation becoming every day a more powerful tool in those fields where information is still on paper. Document recognition follows from data acquisition, from both journals, and entire books in order to transform them in digital objects. We present a new system DAN (Document Analysis on Network) for Document recognition that follows the Open Source methodologies, XML description for documents segmentation and classification, which turns to be beneficial in terms of classification precision, and general-purpose availability.

[1]  George Nagy,et al.  DOCUMENT ANALYSIS WITH AN EXPERT SYSTEM , 1986 .

[2]  Matti Pietikäinen,et al.  Unsupervised texture segmentation using feature distributions , 1997, Pattern Recognit..

[3]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[4]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[5]  Stavros J. Perantonis,et al.  Integrated algorithms for newspaper page decomposition and article tracking , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[6]  Rainer Hoch,et al.  An experimental evaluation of OCR text representations for learning document classifiers , 1998, International Journal on Document Analysis and Recognition.

[7]  Luigi Cinque,et al.  Retrieval of images using rich region descriptions , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[8]  Luigi Cinque,et al.  A Multidimensional Image Browser , 1998, J. Vis. Lang. Comput..

[9]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Lawrence O'Gorman,et al.  Document Image Analysis , 1996 .

[11]  Jiangying Zhou,et al.  Page segmentation and classification , 1992, CVGIP Graph. Model. Image Process..

[12]  Michael Spann,et al.  A quad-tree approach to image segmentation which combines statistical and spatial information , 1985, Pattern Recognit..

[13]  Mahesh Viswanathan,et al.  A prototype document image analysis system for technical journals , 1992, Computer.

[14]  Donato Malerba,et al.  An experimental page layout recognition system for office document automatic classification: an integrated approach for inductive generalization , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[15]  Kuo-Chin Fan,et al.  Page segmentation and identification for intelligent signal processing , 1995, Signal Process..

[16]  Lawrence O'Gorman,et al.  Document Image Analysis Systems - Guest Editors' Introduction to the Special Issue , 1992, Computer.