Discovery of novel biomarkers and phenotypes by semantic technologies

BackgroundBiomarkers and target-specific phenotypes are important to targeted drug design and individualized medicine, thus constituting an important aspect of modern pharmaceutical research and development. More and more, the discovery of relevant biomarkers is aided by in silico techniques based on applying data mining and computational chemistry on large molecular databases. However, there is an even larger source of valuable information available that can potentially be tapped for such discoveries: repositories constituted by research documents.ResultsThis paper reports on a pilot experiment to discover potential novel biomarkers and phenotypes for diabetes and obesity by self-organized text mining of about 120,000 PubMed abstracts, public clinical trial summaries, and internal Merck research documents. These documents were directly analyzed by the InfoCodex semantic engine, without prior human manipulations such as parsing. Recall and precision against established, but different benchmarks lie in ranges up to 30% and 50% respectively. Retrieval of known entities missed by other traditional approaches could be demonstrated. Finally, the InfoCodex semantic engine was shown to discover new diabetes and obesity biomarkers and phenotypes. Amongst these were many interesting candidates with a high potential, although noticeable noise (uninteresting or obvious terms) was generated.ConclusionsThe reported approach of employing autonomous self-organising semantic engines to aid biomarker discovery, supplemented by appropriate manual curation processes, shows promise and has potential to impact, conservatively, a faster alternative to vocabulary processes dependent on humans having to read and analyze all the texts. More optimistically, it could impact pharmaceutical research, for example to shorten time-to-market of novel drugs, or speed up early recognition of dead ends and adverse reactions.

[1]  Peter Hartley Program Co-Chair , 2007 .

[2]  Zhiyong Lu,et al.  Systematic identification of pharmacogenomics information from clinical trials , 2012, J. Biomed. Informatics.

[3]  A. F. Scott,et al.  OMIM: Online Mendelian Inheritance in Man , 2002 .

[4]  Robert F. Ling,et al.  Cluster analysis algorithms for data reduction and classification of objects , 1981 .

[5]  Tiinamaija Tuomi,et al.  Type 1 and type 2 diabetes: what do they have in common? , 2005, Diabetes.

[6]  C. Fellbaum An Electronic Lexical Database , 1998 .

[7]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[8]  William R. Hogan,et al.  Natural Language Processing methods and systems for biomedical ontology learning , 2011, J. Biomed. Informatics.

[9]  Jayanthi Ranjan,et al.  Journal of Theoretical and Applied Information Technology Applications of Data Mining Techniques in Pharmaceutical Industry , 2022 .

[10]  Yael Garten,et al.  Recent progress in automatically extracting information from the pharmacogenomic literature. , 2010, Pharmacogenomics.

[11]  Bridget T. McInnes,et al.  Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies , 2012, J. Biomed. Informatics.

[12]  J. P. Pollard,et al.  A method of parallel iteration , 1989 .

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Gisbert Schneider,et al.  Virtual screening: an endless staircase? , 2010, Nature Reviews Drug Discovery.

[15]  Chitta Baral,et al.  A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions , 2012, J. Biomed. Informatics.

[16]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[17]  Rong Xu,et al.  A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text , 2012, J. Biomed. Informatics.

[18]  J. Ioannidis,et al.  Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses. , 2011, JAMA.

[19]  Andreas Pfützner,et al.  Elevated Intact Proinsulin Levels are Indicative of Beta-Cell Dysfunction, Insulin Resistance, and Cardiovascular Risk: Impact of the Antidiabetic Agent Pioglitazone , 2011, Journal of diabetes science and technology.

[20]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[21]  K. Bretonnel Cohen,et al.  Mining the pharmacogenomics literature - a survey of the state of the art , 2012, Briefings Bioinform..