Detecting miRNA Mentions and Relations in Biomedical Literature

Introduction: MicroRNAs (miRNAs) have demonstrated their potential as post-transcriptional gene expression regulators, participating in a wide spectrum of regulatory events such as apoptosis, differentiation, and stress response. Apart from the role of miRNAs in normal physiology, their dysregulation is implicated in a vast array of diseases. Dissection of miRNA-related associations are valuable for contemplating their mechanism in diseases, leading to the discovery of novel miRNAs for disease prognosis, diagnosis, and therapy. Motivation: Apart from databases and prediction tools, miRNA-related information is largely available as unstructured text. Manual retrieval of these associations can be labor-intensive due to steadily growing number of publications. Additionally, most of the published miRNA entity recognition methods are keyword based, further subjected to manual inspection for retrieval of relations. Despite the fact that several databases host miRNA-associations derived from text, lower sensitivity and lack of published details for miRNA entity recognition and associated relations identification has motivated the need for developing comprehensive methods that are freely available for the scientific community. Additionally, the lack of a standard corpus for miRNA-relations has caused difficulty in evaluating the available systems. We propose methods to automatically extract mentions of miRNAs, species, genes/proteins, disease, and relations from scientific literature. Our generated corpora, along with dictionaries, and miRNA regular expression are freely available for academic purposes. To our knowledge, these resources are the most comprehensive developed so far. Results: The identification of specific miRNA mentions reaches a recall of 0.94 and precision of 0.93. Extraction of miRNA-disease and miRNA-gene relations lead to an F 1 score of up to 0.76. A comparison of the information extracted by our approach to the databases miR2Disease and miRSel for the extraction of Alzheimer's disease related relations shows the capability of our proposed methods in identifying correct relations with improved sensitivity. The published resources and described methods can help the researchers for maximal retrieval of miRNA-relations and generation of miRNA-regulatory networks. Availability: The training and test corpora, annotation guidelines, developed dictionaries, and supplementary files are available at http://www.scai.fraunhofer.de/mirna-corpora.html

[1]  Miguel A. Andrade-Navarro,et al.  Information extraction from full text scientific articles: Where are the keywords? , 2003, BMC Bioinformatics.

[2]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[3]  Guiliang Tang,et al.  The Expression of MicroRNA miR-107 Decreases Early in Alzheimer's Disease and May Accelerate Disease Progression through Regulation of β-Site Amyloid Precursor Protein-Cleaving Enzyme 1 , 2008, The Journal of Neuroscience.

[4]  A MusenMark,et al.  The evolution of Protgé , 2003 .

[5]  Xudong Wu,et al.  Preferential regulation of miRNA targets by environmental chemicals in the human genome , 2011, BMC Genomics.

[6]  Yadong Wang,et al.  miR2Disease: a manually curated database for microRNA deregulation in human disease , 2008, Nucleic Acids Res..

[7]  Zhiyong Lu,et al.  Overview of the BioCreative III Workshop , 2011, BMC Bioinformatics.

[8]  Lesley Cheng,et al.  The detection of microRNA associated with Alzheimer's disease in biological fluids using next-generation sequencing technologies , 2013, Front. Genet..

[9]  A. Delacourte,et al.  Loss of microRNA cluster miR-29a/b-1 in sporadic Alzheimer's disease correlates with increased BACE1/β-secretase expression , 2008, Proceedings of the National Academy of Sciences.

[10]  Wei Liu,et al.  An in silico analysis of microRNAs: mining the miRNAome. , 2010, Molecular bioSystems.

[11]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[12]  V. Ambros,et al.  The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14 , 1993, Cell.

[13]  D. Aoki,et al.  Application of MicroRNA in Diagnosis and Treatment of Ovarian Cancer , 2014, BioMed research international.

[14]  Henrik Eriksson,et al.  The evolution of Protégé: an environment for knowledge-based systems development , 2003, Int. J. Hum. Comput. Stud..

[15]  C. Croce,et al.  Frequent deletions and down-regulation of micro- RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  E. Brown,et al.  The Medical Dictionary for Regulatory Activities (MedDRA) , 1999, Drug safety.

[17]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[18]  Fabian J Theis,et al.  PhenomiR: a knowledgebase for microRNA expression in diseases and biological processes , 2010, Genome Biology.

[19]  Jari Björne,et al.  Comparative analysis of five protein-protein interaction corpora , 2008, BMC Bioinformatics.

[20]  Ralf Zimmer,et al.  miRSel: Automated extraction of associations between microRNAs and genes from the biomedical literature , 2010, BMC Bioinformatics.

[21]  Philip V. Ogren,et al.  Knowtator: A Protégé plug-in for annotated corpus construction , 2006, NAACL.

[22]  George A Calin,et al.  Identification of differentially expressed microRNAs by microarray: A possible role for microRNA genes in pituitary adenomas , 2007, Journal of cellular physiology.

[23]  S. Hébert,et al.  MicroRNAs and Alzheimer's Disease Mouse Models: Current Insights and Future Research Avenues , 2011, International journal of Alzheimer's disease.

[24]  Chi-Ying F. Huang,et al.  miRTarBase: a database curates experimentally validated microRNA–target interactions , 2010, Nucleic Acids Res..

[25]  Adrian J. Shepherd,et al.  A text-mining system for extracting metabolic reactions from full-text articles , 2012, BMC Bioinformatics.

[26]  Hongfei Lin,et al.  BioPPISVMExtractor: A protein-protein interaction extractor for biomedical literature using SVM and rich feature sets , 2010, J. Biomed. Informatics.

[27]  Quaid Morris,et al.  Probing microRNAs with microarrays: tissue specificity and functional inference. , 2004, RNA.

[28]  F. Slack,et al.  Oncomirs — microRNAs with a role in cancer , 2006, Nature Reviews Cancer.

[29]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[30]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[31]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[32]  Richard Tzong-Han Tsai,et al.  Overview of BioCreative II gene mention recognition , 2008, Genome Biology.

[33]  Guodong Zhou,et al.  Extracting Protein-Protein Interaction from Biomedical Text Using Additional Shallow Parsing Information , 2009, 2009 2nd International Conference on Biomedical Engineering and Informatics.

[34]  Di Wu,et al.  miRCancer: a microRNA-cancer association database constructed by text mining on literature , 2013, Bioinform..

[35]  Martin Hofmann-Apitius,et al.  Improving Distantly Supervised Extraction of Drug-Drug and Protein-Protein Interactions , 2012 .

[36]  Jun Zhang,et al.  An Androgen Receptor-MicroRNA-29a Regulatory Circuitry in Mouse Epididymis* , 2013, The Journal of Biological Chemistry.

[37]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[38]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[39]  Norbert Gretz,et al.  miRWalk - Database: Prediction of possible miRNA binding sites by "walking" the genes of three genomes , 2011, J. Biomed. Informatics.

[40]  Nectarios Koziris,et al.  TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support , 2011, Nucleic Acids Res..

[41]  Juliane Fluck,et al.  ProMiner: Recognition of Human Gene and Protein Names using regularly updated Dictionaries , 2007 .