Cross-language MeSH Indexing using Morpho-Semantic Normalization

We consider three alternative procedures for the automatic indexing of medical documents using MeSH thesaurus identifiers as target units (document descriptors). Rather than considering complete words as the starting point of the indexing procedure, we here propose morphologically plausible subwords as basic units from which MeSH terms are derived. We describe the morphological segmentation and normalization procedures, as well as the mappings from subwords to MeSH terms, and discuss results from an evaluation carried out on a German-language corpus.