Mining biomarker information in biomedical literature

BackgroundFor selection and evaluation of potential biomarkers, inclusion of already published information is of utmost importance. In spite of significant advancements in text- and data-mining techniques, the vast knowledge space of biomarkers in biomedical text has remained unexplored. Existing named entity recognition approaches are not sufficiently selective for the retrieval of biomarker information from the literature. The purpose of this study was to identify textual features that enhance the effectiveness of biomarker information retrieval for different indication areas and diverse end user perspectives.MethodsA biomarker terminology was created and further organized into six concept classes. Performance of this terminology was optimized towards balanced selectivity and specificity. The information retrieval performance using the biomarker terminology was evaluated based on various combinations of the terminology's six classes. Further validation of these results was performed on two independent corpora representing two different neurodegenerative diseases.ResultsThe current state of the biomarker terminology contains 119 entity classes supported by 1890 different synonyms. The result of information retrieval shows improved retrieval rate of informative abstracts, which is achieved by including clinical management terms and evidence of gene/protein alterations (e.g. gene/protein expression status or certain polymorphisms) in combination with disease and gene name recognition. When additional filtering through other classes (e.g. diagnostic or prognostic methods) is applied, the typical high number of unspecific search results is significantly reduced. The evaluation results suggest that this approach enables the automated identification of biomarker information in the literature. A demo version of the search engine SCAIView, including the biomarker retrieval, is made available to the public through http://www.scaiview.com/scaiview-academia.html.ConclusionsThe approach presented in this paper demonstrates that using a dedicated biomarker terminology for automated analysis of the scientific literature maybe helpful as an aid to finding biomarker information in text. Successful extraction of candidate biomarkers information from published resources can be considered as the first step towards developing novel hypotheses. These hypotheses will be valuable for the early decision-making in the drug discovery and development process.

[1]  D. DeMets,et al.  Biomarkers and surrogate endpoints: Preferred definitions and conceptual framework , 2001, Clinical pharmacology and therapeutics.

[2]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[3]  Hans-Peter Kriegel,et al.  Extraction of semantic biomedical relations from text using conditional random fields , 2008, BMC Bioinformatics.

[4]  Ray Bakhtiar,et al.  Biomarkers in drug discovery and development. , 2008, Journal of pharmacological and toxicological methods.

[5]  Peter L. Elkin,et al.  BioProspecting: novel marker discovery obtained by mining the bibleome , 2009, BMC Bioinformatics.

[6]  Debashis Ghosh,et al.  "Omics" data and levels of evidence for biomarker discovery. , 2009, Genomics.

[7]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  A. Valencia,et al.  Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge , 2008, Genome Biology.

[9]  R. Mayeux Biomarkers: Potential uses and limitations , 2004, NeuroRX.

[10]  John A. Timbrell Types of Biomarker and challenges for new biomarkers , 2004 .

[11]  O. Hurko,et al.  Valuation of biomarkers , 2011, Nature Reviews Drug Discovery.

[12]  Hesham H. Ali,et al.  Link test - A statistical method for finding prostate cancer biomarkers , 2006, Comput. Biol. Chem..

[13]  A critical assessment of text mining methods in molecular biology. Proceedings of a workshop. March 28-31, 2004. Granada, Spain. , 2005, BMC bioinformatics.

[14]  Siegfried Benkner,et al.  @neuLink: A Service-oriented Application for Biomedical Knowledge Discovery , 2008, HealthGrid.

[15]  J. Wagner Strategic approach to fit-for-purpose biomarkers in drug development. , 2008, Annual review of pharmacology and toxicology.

[16]  K. Cohen,et al.  Overview of BioCreative II gene normalization , 2008, Genome Biology.

[17]  Keith M. Kerr,et al.  Gene Expression Profiling in Non-Small Cell Lung Cancer , 2004, Clinical Cancer Research.

[18]  Eva Szabo MUC1 expression in lung cancer. , 2003, Methods in molecular medicine.

[19]  A. Valencia,et al.  Text-mining and information-retrieval services for molecular biology , 2005, Genome Biology.

[20]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[21]  C. Altar The Biomarkers Consortium: On the Critical Path of Drug Discovery , 2008, Clinical pharmacology and therapeutics.

[22]  S. Hanash,et al.  A Compendium of Potential Biomarkers of Pancreatic Cancer , 2009, PLoS medicine.

[23]  Alejandro F. Frangi,et al.  @neurIST: Infrastructure for Advanced Disease Management Through Integration of Heterogeneous Data, Computing, and Complex Processing Services , 2010, IEEE Transactions on Information Technology in Biomedicine.

[24]  Frank Dieterle,et al.  Impact of biomarker development on drug safety assessment. , 2010, Toxicology and applied pharmacology.

[25]  Daniel Hanisch,et al.  Playing Biology's Name Game: Identifying Protein Names in Scientific Text , 2002, Pacific Symposium on Biocomputing.

[26]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[27]  Charles L. A. Clarke,et al.  Information Retrieval - Implementing and Evaluating Search Engines , 2010 .

[28]  Jeroen L. A. Pennings,et al.  Discovery of Novel Serum Biomarkers for Prenatal Down Syndrome Screening by Integrative Data Mining , 2009, PloS one.

[29]  Luc Dehaspe,et al.  Integrating automated literature searches and text mining in biomarker discovery , 2010, BMC Bioinformatics.

[30]  Abhaya C. Nayak,et al.  Biomarker information extraction tool (BIET) development using natural language processing and machine learning , 2010, ICWET.

[31]  F P Perera,et al.  Molecular epidemiology: recent advances and future directions. , 2000, Carcinogenesis.

[32]  Daniel Hanisch,et al.  ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.