USI at BioASQ 2015: a Semantic Similarity-based Approach for Semantic Indexing

The need of indexing biomedical papers with the MeSH is incessantly growing and automated approaches are constantly evolving. Since 2013, the BioASQ challenge has been promoting those evolutions by proposing datasets and evaluation metrics. In this paper, we present our system, USI, and how we adapted it to participate to this challenge this year. USI is a generic approach, which means it does not directly take into account the content of the document to annotate. The results lead us to the conclusion that methods that solely rely on semantic annotations available in the corpus can already perform well compared to NLP-based approaches as our results always figure in the top ones.

[1]  Junzhong Gu,et al.  A New Model of Information Content for Semantic Similarity in WordNet , 2008, 2008 Second International Conference on Future Generation Communication and Networking Symposia.

[2]  Georgios Paliouras,et al.  Evaluation measures for hierarchical classification: a unified view and novel approaches , 2013, Data Mining and Knowledge Discovery.

[3]  Zhiyong Lu,et al.  Recommending MeSH terms for annotating biomedical articles , 2011, J. Am. Medical Informatics Assoc..

[4]  Sylvie Ranwez,et al.  The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies , 2014, Bioinform..

[5]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[6]  Zhiyong Lu,et al.  NCBI at the 2013 BioASQ challenge task: Learning to rank for automatic MeSH indexing , 2013 .

[7]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[8]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[9]  Mark A. Musen,et al.  The Open Biomedical Annotator , 2009, Summit on translational bioinformatics.

[10]  Sylvie Ranwez,et al.  USI: a fast and accurate approach for conceptual document annotation , 2015, BMC Bioinformatics.

[11]  Zhiyong Lu,et al.  NCBI at the 2014 BioASQ Challenge Task: Large-scale Biomedical Semantic Indexing and Question Answering , 2014, CLEF.

[12]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[13]  Susanne M. Humphrey,et al.  The NLM Indexing Initiative's Medical Text Indexer , 2004, MedInfo.

[14]  Trevor Cohen,et al.  Reflective random indexing for semi-automatic indexing of the biomedical literature , 2010, J. Biomed. Informatics.

[15]  Zhiyong Lu,et al.  Evaluation of query expansion using MeSH in PubMed , 2009, Information Retrieval.

[16]  David Sánchez,et al.  A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain , 2014, J. Biomed. Informatics.

[17]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[18]  Pierre Zweigenbaum,et al.  Using Co-Authoring and Cross-Referencing Information for MEDLINE Indexing. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[19]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[20]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[21]  Antonio Jimeno-Yepes,et al.  A One-Size-Fits-All Indexing Method Does Not Exist: Automatic Selection Based on Meta-Learning , 2012, J. Comput. Sci. Eng..

[22]  Jimmy J. Lin,et al.  PubMed related articles: a probabilistic topic-based model for content similarity , 2007, BMC Bioinformatics.