论文信息 - Incremental machine learning techniques for document layout understanding

Incremental machine learning techniques for document layout understanding

In real-world digital libraries, artificial intelligence techniques are essential for tackling the automatic document processing task with sufficient flexibility. The great variability in document kind, content and shape requires powerful representation formalisms to catch all the domain complexity. The continuous flow of new documents requires adaptable techniques that can progressively adjust the acquired knowledge on documents as long as new evidence becomes available, even extending if needed the set of recognized document types. Both these issues have not yet been thoroughly studied. This paper presents an incremental first-order logic learning framework for automatically dealing with various kinds of evolution in digital repositories content: evolution in the definition of class definitions, evolution in the set of known classes and evolution by addition of new unknown classes. Experiments show that the approach can be applied to real-world.

Stefano Ferilli | Floriana Esposito | Marenglen Biba | Teresa Maria Altomare Basile

[1] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[2] Stefano Ferilli,et al. Generalization-Based Similarity for Conceptual Clustering , 2007, MCD.

[3] Edward A. Fox,et al. How to Make Intelligent Digital Libraries , 1994, ISMIS.

[4] Dekang Lin,et al. An Information-Theoretic Definition of Similarity , 1998, ICML.

[5] Nicola Fanizzi,et al. Incremental multistrategy learning for document processing , 2003, Appl. Artif. Intell..

[6] Stefano Ferilli,et al. Incremental Learning of First Order Logic Theories for the Automatic Annotations of Web Documents , 2007 .

[7] Bin Ma,et al. The similarity metric , 2001, IEEE Transactions on Information Theory.

[8] George Nagy,et al. Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..