Document annotation by active learning techniques

We present a system for the semantic annotation of layout-oriented documents, with an integrated learning component. We introduce probabilistic learning methods on tree-like documents and we present different active learning techniques for training document annotation models. We report some preliminary results of deploying such active learning techniques on an important case of document collection annotation.