A recent advance in the automatic indexing of the biomedical literature

The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE, the largest bibliographic database of biomedical citations. Indexers at the US National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload. After reviewing issues in the automatic assignment of Medical Subject Headings (MeSH terms) to biomedical text, we focus more specifically on the new subheading attachment feature for NLM's Medical Text Indexer (MTI). Natural Language Processing, statistical, and machine learning methods of producing automatic MeSH main heading/subheading pair recommendations were assessed independently and combined. The best combination achieves 48% precision and 30% recall. After validation by NLM indexers, a suitable combination of the methods presented in this paper was integrated into MTI as a subheading attachment feature producing MeSH indexing recommendations compliant with current state-of-the-art indexing practice.

[1]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[2]  Marek Reformat,et al.  Multilabel associative classification categorization of MEDLINE articles into MeSH keywords. , 2007, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[3]  Virginia A. Lingle,et al.  Indexing and Abstracting in Theory and Practice , 2005 .

[4]  Chris J. Lu,et al.  Journal Descriptor Indexing Tool for Categorizing Text According to Discipline or Semantic Type , 2006, AMIA.

[5]  M E Funk,et al.  Indexing consistency in MEDLINE. , 1983, Bulletin of the Medical Library Association.

[6]  Thomas C. Rindflesch,et al.  Multiple Approaches to Fine-Grained Indexing of the Biomedical Literature , 2006, Pacific Symposium on Biocomputing.

[7]  Susanne M. Humphrey,et al.  The NLM Indexing Initiative's Medical Text Indexer , 2004, MedInfo.

[8]  W. John Wilbur,et al.  Automatic MeSH term assignment and quality assessment , 2001, AMIA.

[9]  Shamkant B. Navathe,et al.  Text Mining Functional Keywords Associated with Genes , 2004, MedInfo.

[10]  Stéfan Jacques Darmoni,et al.  Automatic indexing of online health resources for a French quality controlled gateway , 2006, Inf. Process. Manag..

[11]  Ian H. Witten,et al.  Stacking Bagged and Dagged Models , 1997, ICML.

[12]  Padmini Srinivasan,et al.  Hierarchical Text Categorization Using Neural Networks , 2004, Information Retrieval.

[13]  Olivier Bodenreider,et al.  Beyond synonymy: exploiting the UMLS semantics in mapping vocabularies , 1998, AMIA.

[14]  Robert H. Baud,et al.  Learning-Free Text Categorization , 2003, AIME.

[15]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[16]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[17]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[18]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[19]  Jimmy J. Lin,et al.  PubMed related articles: a probabilistic topic-based model for content similarity , 2007, BMC Bioinformatics.

[20]  Olivier Bodenreider,et al.  From indexing the biomedical literature to coding clinical text: experience with MTI and machine learning approaches , 2007, BioNLP@ACL.

[21]  Alan R. Aronson,et al.  Fine-Grained Indexing of the Biomedical Literature: MeSH Subheading Attachment for a MEDLINE Indexing Tool , 2007, AMIA.

[22]  Andrey Rzhetsky,et al.  Imitating Manual Curation of Text-Mined Facts in Biomedicine , 2006, PLoS Comput. Biol..

[23]  Sunghwan Sohn,et al.  Research Paper: Optimal Training Sets for Bayesian Prediction of MeSH® Assignment , 2008, J. Am. Medical Informatics Assoc..

[24]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[25]  Susanne M. Humphrey,et al.  Automatic Indexing of Documents from Journal Descriptors: A Preliminary Investigation , 1999, J. Am. Soc. Inf. Sci..