BioTextRetriever : yet another Information Retrieval system

It is of capital importance for every researcher to be aware of the work that has been done in his research area. However, finding “interesting/relevant” publications in the overwhelming amount of documents available in the Internet is quite difficult. We propose the use of Text Mining to address this information overload problem by automating the process of extracting relevant papers from very large repositories of scientific literature. We present in this paper, the automatic construction of a classifier capable of selecting the relevant papers among the whole MEDLINE, that is part of a software developed tool: BioTextRetriever. The empirical evaluation of the work shows a classifier’s accuracies around 95%.

[1]  Eugénio C. Oliveira,et al.  From Sequences to Papers: An Information Retrieval Exercise , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[2]  Eugénio C. Ferreira,et al.  @Note: A workbench for Biomedical Text Mining , 2009, J. Biomed. Informatics.

[3]  Sophia Ananiadou,et al.  FACTA: a text search engine for finding associated biomedical concepts , 2008, Bioinform..

[4]  Cheng-Ming Chuong,et al.  Pubfocus: Semantic Medline/pubmed Citations Analytics through Integration of Controlled Biomedical Dictionaries and Ranking Algorithm Pubfocus:semanticmedline/pubmedcitations Analyticsthroughintegrationofcontrolledbiomedical Dictionariesandrankingalgorithm , 2022 .

[5]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[6]  William R. Hersh,et al.  A Survey of Current Work in Biomedical Text Mining , 2005 .

[7]  Martin H. Schaefer,et al.  MedlineRanker: flexible ranking of biomedical literature , 2009, Nucleic Acids Res..

[8]  Dietrich Rebholz-Schuhmann,et al.  EBIMed - text crunching to gather facts for proteins from Medline , 2007, Bioinform..

[9]  Mir S. Siadaty,et al.  Bmc Medical Informatics and Decision Making Relemed: Sentence-level Search Engine with Relevance Score for the Medline Database of Biomedical Articles , 2007 .

[10]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[11]  William B. Langdon,et al.  BioRAT: extracting biological information from full-length papers , 2004, Bioinform..

[12]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.