Two Hierarchical Text Categorization Approaches for BioASQ Semantic Indexing Challenge

This paper describes our participation in the BioASQ semantic indexing challenge with two hierarchical text categorization systems. Both systems originated from previous research in thesaurus topic assignment applied on small domains from the legal document management field. One of the described systems employs a classical top-down approach based on a collection of local classifiers. The other system builds a Bayesian network induced by the thesaurus structure and contents, taking into account descriptor labels and related terms. We describe the adaptations required to deal with a large thesaurus like MeSH and a huge document collection and discuss the results obtained in the BioASQ challenge and the limitations of both approaches.