Multistrategy Learning for Document Recognition

In this paper, a methodology for document classification and understanding is proposed. It is based on a multistrategy approach to learning from examples. By document classification, we mean the process of identification of the particular class to which a document belongs. Document understanding is defined as the process of detecting the logical structure of a document. The multistrategy approach for document classification and understanding has been implemented in a system called PLRS, which embeds two empirical learning systems: RES and INDUBIIH. Given a set of documents whose layout structure has already been detected and such that the membership class has been defined by the user, RES generates the knowledge base of an expert system devoted to the classification of a document. The language used to describe both the layout of the training documents and the learned rules is a first-order language. The learning methodology adopted for the problem of learning classification rules integrates both a paramet...

[1]  Donato Malerba,et al.  An experimental page layout recognition system for office document automatic classification: an integrated approach for inductive generalization , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[2]  Ryszard S. Michalski,et al.  Incremental Generation of VL1 Hypotheses: The Underlying Methodology and the Description of Program AQ11 , 1983 .

[3]  S. Tsujimoto,et al.  Understanding multi-articled documents , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[4]  J. Gower,et al.  Multivariate data analysis , 1972 .

[5]  Donato Malerba,et al.  Machine Learning Techniques for Knowledge Acquisition and Refinement , 1993, SEKE.

[6]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[7]  Donato Malerba,et al.  Classification in Noisy Environments Using a Distance Measure Between Structural Symbolic Descriptions , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Donato Malerba,et al.  A Distance Measure for Decision Making in Uncertain Domains , 1990, IPMU.

[9]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[10]  Klaus Kreplin,et al.  Knowledge based document classification supporting integrated document handling , 1988 .

[11]  Stephen Muggleton,et al.  Efficient Induction of Logic Programs , 1990, ALT.

[12]  Donato Malerba,et al.  A syntactic distance for partially matching learned concepts against noisy structural object descriptions , 1991 .

[13]  Donato Malerba,et al.  Incorporating statistical techniques into empirical symbolic learning systems , 1993 .

[14]  Donato Malerba,et al.  Negation as a Specializing Operator , 1993, AI*IA.

[15]  Floriana Esposito Automated Acquisition of Production Rules by Empirical Supervised Learning Methods , 1990 .

[16]  G. Plotkin Automatic Methods of Inductive Inference , 1972 .

[17]  Amitabha Mukerjee,et al.  A Qualitative Model for Space , 1990, AAAI.

[18]  Donato Malerba,et al.  Flexible Matching for Noisy Structural Descriptions , 1991, IJCAI.

[19]  Thomas G. Dietterich,et al.  Learning to Predict Sequences , 1985 .

[20]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Michael R. Genesereth,et al.  Logical foundations of artificial intelligence , 1987 .

[22]  Donato Malerba,et al.  Empirical learning methods for digitized document recognition: an integrated approach to inductive generalization , 1990, Sixth Conference on Artificial Intelligence for Applications.

[23]  Andreas Dengel,et al.  High Level Document Analysis Guided by Geometric Aspects , 1988, Int. J. Pattern Recognit. Artif. Intell..

[24]  Bernard Pagurek,et al.  Letter pattern recognition , 1990, Sixth Conference on Artificial Intelligence for Applications.

[25]  Wolfgang Horak,et al.  Office Document Architecture and Office Document Interchange Formats: Current Status of International Standardization , 1985, Computer.

[26]  George Nagy,et al.  DOCUMENT ANALYSIS WITH AN EXPERT SYSTEM , 1986 .

[27]  Gerard V. Trunk,et al.  A Problem of Dimensionality: A Simple Example , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  S.C. Hinds,et al.  A rule-based system for document image segmentation , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[29]  Fiorenza Socci,et al.  Informatics for the Law: LEXIS—A legal expert system on Italian family law , 1992 .

[30]  Luc De Raedt,et al.  Multiple Predicate Learning , 1993, IJCAI.

[31]  James Burton Larson,et al.  Inductive inference in the variable valued predicate logic system vl21: methodology and computer implementation. , 1977 .

[32]  Ryszard S. Michalski,et al.  Pattern Recognition as Rule-Guided Inductive Inference , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  R M Haralick,et al.  The consistent labeling problem: part I. , 1979, IEEE transactions on pattern analysis and machine intelligence.