Incremental machine learning techniques for document layout understanding

In real-world digital libraries, artificial intelligence techniques are essential for tackling the automatic document processing task with sufficient flexibility. The great variability in document kind, content and shape requires powerful representation formalisms to catch all the domain complexity. The continuous flow of new documents requires adaptable techniques that can progressively adjust the acquired knowledge on documents as long as new evidence becomes available, even extending if needed the set of recognized document types. Both these issues have not yet been thoroughly studied. This paper presents an incremental first-order logic learning framework for automatically dealing with various kinds of evolution in digital repositories content: evolution in the definition of class definitions, evolution in the set of known classes and evolution by addition of new unknown classes. Experiments show that the approach can be applied to real-world.