A One-Size-Fits-All Indexing Method Does Not Exist: Automatic Selection Based on Meta-Learning

We present a methodology that automatically selects indexing algorithms for each heading in Medical Subject Headings (MeSH), National Library of Medicine’s vocabulary for indexing MEDLINE. While manually comparing indexing methods is manageable with a limited number of MeSH headings, a large number of them make automation of this selection desirable. Results show that this process can be automated, based on previously indexed MEDLINE citations. We find that AdaBoostM1 is better suited to index a group of MeSH hedings named Check Tags, and helps improve the micro F-measure from 0.5385 to 0.7157, and the macro F-measure from 0.4123 to 0.5387 (both p < 0.01). Category: Convergence computing

[1]  Olivier Bodenreider,et al.  Utilizing the UMLS for Semantic Mapping between Terminologies , 2005, AMIA.

[2]  Padmini Srinivasan,et al.  Hierarchical neural networks for text categorization (poster abstract) , 1999, SIGIR '99.

[3]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[4]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[5]  Patrick Ruch,et al.  Automatic assignment of biomedical categories: toward a generic approach , 2006, Bioinform..

[6]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[7]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[8]  Wessel Kraaij,et al.  MeSH Up: effective MeSH text classification for improved document retrieval , 2009, Bioinform..

[9]  Wanda Pratt,et al.  The Effect of Feature Representation on MEDLINE Document Classification , 2005, AMIA.

[10]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[11]  Russ B. Altman,et al.  MScanner: a classifier for retrieving Medline citations , 2008, BMC Bioinformatics.

[12]  Olivier Bodenreider,et al.  The NLM Indexing Initiative , 2000, AMIA.

[13]  Vincent Claveau,et al.  Automatic inference of indexing rules for MEDLINE , 2008, BMC Bioinformatics.

[14]  Alexandros Kalousis,et al.  Algorithm selection via meta-learning , 2002 .

[15]  Susanne M. Humphrey,et al.  The NLM Indexing Initiative's Medical Text Indexer , 2004, MedInfo.

[16]  서정연,et al.  Journal of Computing Science and Engineering(JCSE)의 국제화 작업 , 2010 .

[17]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[18]  Yindalon Aphinyanagphongs,et al.  Research Paper: Text Categorization Models for High-Quality Article Retrieval in Internal Medicine , 2004, J. Am. Medical Informatics Assoc..

[19]  Antonio Jimeno-Yepes,et al.  MEDLINE MeSH indexing: lessons learned from machine learning and future directions , 2012, IHI '12.

[20]  Jung-jae Kim,et al.  Automatic Suggestion for PubMed Query Reformulation , 2012, J. Comput. Sci. Eng..

[21]  George R. Thoma,et al.  Design and Development of a Multimodal Biomedical Information Retrieval System , 2012, J. Comput. Sci. Eng..

[22]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[23]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[24]  Jimmy J. Lin,et al.  PubMed related articles: a probabilistic topic-based model for content similarity , 2007, BMC Bioinformatics.

[25]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[26]  James G. Mork,et al.  A bottom-up approach to MEDLINE indexing recommendations. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[27]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[28]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.