Neuro-fuzzy ART-based document management system: application to mail distribution and digital libraries

Abstract A new document management system is proposed in this paper. Its kernel is based on a new set of neuro-fuzzy systems of the ART family: FasArt and RFasArt. The first one, FasArt, is used to support a simple Optical Character Recognition (OCR) that inherits fine properties of ART architectures, such as fast and incremental learning, stability and modularity. On the other hand, RFasArt is a new recurrent version of FasArt which efficiently exploits contextual information in the task of logical labeling. The proposed system is extensively tested in two real-world applications, i.e. E-mail of printed business letter and digital library of scientific papers. Experimental results show logical labeling and OCR rates over 90%. The proposed system is better compared to a previous system proposed by the group, where instead of using contextual information in an integrated way, a postprocessing Viterbi-based model was employed.

[1]  Kazuhiko Yamamoto,et al.  Structured Document Image Analysis , 1992, Springer Berlin Heidelberg.

[2]  P. Metzger,et al.  Network Working Group , 2000 .

[3]  T.A. Bayer,et al.  Experiments on extracting structural information from paper documents using syntactic pattern analysis , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[4]  Lawrence O'Gorman,et al.  Document Image Analysis , 1996 .

[5]  Enrique Vidal,et al.  Computation of Normalized Edit Distance and Applications , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Enrique Vidal,et al.  Efficient Error-Correcting Viterbi Parsing , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Dr.rer. nat. Wolfgang Appelt Document Architecture in Open Systems: The ODA Standard , 1991, Springer Berlin Heidelberg.

[8]  Wolfgang Horak,et al.  Office Document Architecture and Office Document Interchange Formats: Current Status of International Standardization , 1985, Computer.

[9]  Yuan Yan Tang,et al.  Document Processing for Automatic Knowledge Acquisition , 1994, IEEE Trans. Knowl. Data Eng..

[10]  R. Hoch,et al.  READLEX: a lexicon for the recognition and analysis of structured documents , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[11]  Yannis A. Dimitriadis,et al.  Structured document labeling and rule extraction using a new recurrent fuzzy-neural system , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[12]  Andreas Dengel,et al.  High Level Document Analysis Guided by Geometric Aspects , 1988, Int. J. Pattern Recognit. Artif. Intell..

[13]  Andreas Dengel,et al.  ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed Documents , 1992 .

[14]  C. G. Leedham,et al.  Handwriting and Drawing Research: Basic and Applied Issues , 1996 .

[15]  Dick B. Simmons,et al.  Fuzzy approach to document recognition , 1993, [Proceedings 1993] Second IEEE International Conference on Fuzzy Systems.

[16]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[17]  Vincent Quint,et al.  Structured documents , 1989 .

[18]  Mahesh Viswanathan,et al.  Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Yannis A. Dimitriadis,et al.  Learning from noisy information in FasArt and FasBack neuro-fuzzy systems , 2001, Neural Networks.

[20]  Yannis A. Dimitriadis,et al.  A new neuro-fuzzy system for logical labeling of documents , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[21]  Stephen Grossberg,et al.  Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps , 1992, IEEE Trans. Neural Networks.

[22]  Godfried T. Toussaint,et al.  The use of context in pattern recognition , 1978, Pattern Recognit..

[23]  B.J. Oommen,et al.  Pattern recognition of strings with substitutions, insertions, deletions and generalized transpositions , 1997, Pattern Recognit..