Multiple Approaches to Fine-Grained Indexing of the Biomedical Literature

The number of articles in the MEDLINE database is expected to increase tremendously in the coming years. To ensure that all these documents are indexed with continuing high quality, it is necessary to develop tools and methods that help the indexers in their daily task. We present three methods addressing a novel aspect of automatic indexing of the biomedical literature, namely producing MeSH main heading/subheading pair recommendations. The methods, (dictionary-based, post- processing rules and Natural Language Processing rules) are described and evaluated on a genetics-related corpus. The best overall performance is obtained for the subheading genetics (70% precision and 17% recall with post-processing rules, 48% precision and 37% recall with the dictionary-based method). Future work will address extending this work to all MeSH subheadings and a more thorough study of method combination.

[1]  Olivier Bodenreider,et al.  The NLM Indexing Initiative , 2000, AMIA.

[2]  Olivier Bodenreider,et al.  Beyond synonymy: exploiting the UMLS semantics in mapping vocabularies , 1998, AMIA.

[3]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[4]  Halil Kilicoglu,et al.  Semantic Relations Asserting the Etiology of Genetic Diseases , 2003, AMIA.

[5]  Philippe Langlais,et al.  Trans Type: Development-Evaluation Cycles to Boost Translator's Productivity , 2002, Machine Translation.

[6]  Susanne M. Humphrey,et al.  Automatic Indexing of Documents from Journal Descriptors: A Preliminary Investigation , 1999, J. Am. Soc. Inf. Sci..

[7]  Susanne M. Humphrey,et al.  The NLM Indexing Initiative's Medical Text Indexer , 2004, MedInfo.

[8]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[9]  Stéfan Jacques Darmoni,et al.  Automatic indexing of online health resources for a French quality controlled gateway , 2006, Inf. Process. Manag..

[10]  Robert H. Baud,et al.  Learning-Free Text Categorization , 2003, AIME.

[11]  Susanne M. Humphrey Indexing biomedical documents: From thesaural to knowledge-based retrieval systems , 1992, Artif. Intell. Medicine.

[12]  Padmini Srinivasan,et al.  Hierarchical neural networks for text categorization , 1999, SIGIR 1999.

[13]  A. Dunker The pacific symposium on biocomputing , 1998 .

[14]  W. John Wilbur,et al.  Automatic MeSH term assignment and quality assessment , 2001, AMIA.