MEDRank: Using graph-based concept ranking to index biomedical texts

BACKGROUND As the volume of biomedical text increases exponentially, automatic indexing becomes increasingly important. However, existing approaches do not distinguish central (or core) concepts from concepts that were mentioned in passing. We focus on the problem of indexing MEDLINE records, a process that is currently performed by highly trained humans at the National Library of Medicine (NLM). NLM indexers are assisted by a system called the Medical Text Indexer (MTI) that suggests candidate indexing terms. OBJECTIVE To improve the ability of MTI to select the core terms in MEDLINE abstracts. These core concepts are deemed to be most important and are designated as "major headings" by MEDLINE indexers. We introduce and evaluate a graph-based indexing methodology called MEDRank that generates concept graphs from biomedical text and then ranks the concepts within these graphs to identify the most important ones. METHODS We insert a MEDRank step into the MTI and compare MTI's output with and without MEDRank to the MEDLINE indexers' selected terms for a sample of 11,803 PubMed Central articles. We also tested whether human raters prefer terms generated by the MEDLINE indexers, MTI without MEDRank, and MTI with MEDRank for a sample of 36 PubMed Central articles. RESULTS MEDRank improved recall of major headings designated by 30% over MTI without MEDRank (0.489 vs. 0.376). Overall recall was only slightly (6.5%) higher (0.490 vs. 0.460) as was F(2) (3%, 0.408 vs. 0.396). However, overall precision was 3.9% lower (0.268 vs. 0.279). Human raters preferred terms generated by MTI with MEDRank over terms generated by MTI without MEDRank (by an average of 1.00 more term per article), and preferred terms generated by MTI with MEDRank and the MEDLINE indexers at the same rate. CONCLUSIONS The addition of MEDRank to MTI significantly improved the retrieval of core concepts in MEDLINE abstracts and more closely matched human expectations compared to MTI without MEDRank. In addition, MEDRank slightly improved overall recall and F(2).

[1]  Thomas C. Rindflesch,et al.  Multiple Approaches to Fine-Grained Indexing of the Biomedical Literature , 2006, Pacific Symposium on Biocomputing.

[2]  Yindalon Aphinyanagphongs,et al.  Research Paper: Using Citation Data to Improve Retrieval from MEDLINE , 2006, J. Am. Medical Informatics Assoc..

[3]  J. A. Bondy,et al.  Graph Theory with Applications , 1978 .

[4]  Ramon Sangüesa,et al.  Extracting reputation in multi agent systems by means of social network topology , 2002, AAMAS '02.

[5]  M E Funk,et al.  Indexing consistency in MEDLINE. , 1983, Bulletin of the Medical Library Association.

[6]  Jacek M. Zurada,et al.  Computational Intelligence: Imitating Life , 1994 .

[7]  Walter Kintsch,et al.  Comprehension: A Paradigm for Cognition , 1998 .

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Trevor Cohen,et al.  Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections , 2010, J. Biomed. Informatics.

[10]  Magnus Sahlgren,et al.  An Introduction to Random Indexing , 2005 .

[11]  Trevor Cohen,et al.  Empirical distributional semantics: Methods and biomedical applications , 2009, J. Biomed. Informatics.

[12]  Padraic Monaghan,et al.  Proceedings of the 23rd annual conference of the cognitive science society , 2001 .

[13]  Alan R. Aronson,et al.  Application of a Medical Text Indexer to an Online Dermatology Atlas , 2004, MedInfo.

[14]  Yukio Ohsawa,et al.  KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[15]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[16]  Alan R. Aronson,et al.  Semi-Automatic Indexing of Full Text Biomedical Articles , 2005, AMIA.

[17]  Susanne M. Humphrey,et al.  A recent advance in the automatic indexing of the biomedical literature , 2009, J. Biomed. Informatics.

[18]  L. Brain Structure of the scientific paper. , 1965, British medical journal.

[19]  Halil Kilicoglu,et al.  Abstraction Summarization for Managing the Biomedical Research Literature , 2004, HLT-NAACL 2004.

[20]  F. Suppe The Structure of a Scientific Paper , 1998, Philosophy of Science.

[21]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[22]  Trevor Cohen,et al.  Reflective random indexing for semi-automatic indexing of the biomedical literature , 2010, J. Biomed. Informatics.

[23]  W. John Wilbur,et al.  Automatic MeSH term assignment and quality assessment , 2001, AMIA.

[24]  Susanne M. Humphrey,et al.  The NLM Indexing Initiative's Medical Text Indexer , 2004, MedInfo.

[25]  Anders Holst,et al.  Random indexing of text samples for latent semantic analysis , 2000 .