Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature

Abstract The detection of MicroRNA (miRNA) mentions in scientific literature facilitates researchers with the ability to find relevant and appropriate literature based on queries formulated using miRNA information. Considering most published biological studies elaborated on signal transduction pathways or genetic regulatory information in the form of figure captions, the extraction of miRNA from both the main content and figure captions of a manuscript is useful in aggregate analysis and comparative analysis of the studies published. In this study, we present a statistical principle-based miRNA recognition and normalization method to identify miRNAs and link them to the identifiers in the Rfam database. As one of the core components in the text mining pipeline of the database miRTarBase, the proposed method combined the advantages of previous works relying on pattern, dictionary and supervised learning and provided an integrated solution for the problem of miRNA identification. Furthermore, the knowledge learned from the training data was organized in a human-interpretable manner to understand the reason why the system considers a span of text as a miRNA mention, and the represented knowledge can be further complemented by domain experts. We studied the ambiguity level of miRNA nomenclature to connect the miRNA mentions to the Rfam database and evaluated the performance of our approach on two datasets: the BioCreative VI Bio-ID corpus and the miRNA interaction corpus by extending the later corpus with additional Rfam normalization information. Our study highlights and also proposes a better understanding of the challenges associated with miRNA identification and normalization in scientific literature and the research gap that needs to be further explored in prospective studies.

[1]  Hong-Jie Dai,et al.  Principle Base Approach for Classifying Tweets with Flu-related Information in NTCIR-13 MedWeb Task , 2017 .

[2]  Cathy H. Wu,et al.  miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases , 2016, Journal of Biomedical Semantics.

[3]  J. Sulston,et al.  Isolation and genetic characterization of cell-lineage mutants of the nematode Caenorhabditis elegans. , 1980, Genetics.

[4]  K. Gunsalus,et al.  Combinatorial microRNA target predictions , 2005, Nature Genetics.

[5]  D. Bartel,et al.  Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. , 2005, RNA.

[6]  Di Wu,et al.  miRCancer: a microRNA-cancer association database constructed by text mining on literature , 2013, Bioinform..

[7]  G. Ruvkun,et al.  A uniform system for microRNA annotation. , 2003, RNA.

[8]  Ralf Zimmer,et al.  miRSel: Automated extraction of associations between microRNAs and genes from the biomedical literature , 2010, BMC Bioinformatics.

[9]  S. Griffiths-Jones,et al.  miRBase: microRNA Sequences and Annotation , 2010, Current protocols in bioinformatics.

[10]  Alfonso Valencia,et al.  Benchmarking biomedical text mining web servers at BioCreative V.5: the technical Interoperability and Performance of annotation Servers - TIPS track , 2017 .

[11]  Fabio Rinaldi,et al.  Improving biocuration of microRNAs in diseases: a case study in idiopathic pulmonary fibrosis , 2017, Database J. Biol. Databases Curation.

[12]  D. Tollervey,et al.  Mapping the Human miRNA Interactome by CLASH Reveals Frequent Noncanonical Binding , 2013, Cell.

[13]  Yifan Peng,et al.  miRTex: A Text Mining System for miRNA-Gene Relation Extraction , 2015, PLoS Comput. Biol..

[14]  Wen-Lian Hsu,et al.  Statistical Principle-Based Approach for Detecting miRNA-Target Gene Interaction Articles , 2016, 2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE).

[15]  Norbert Gretz,et al.  miRWalk - Database: Prediction of possible miRNA binding sites by "walking" the genes of three genomes , 2011, J. Biomed. Informatics.

[16]  Shruti Rao,et al.  MET network in PubMed: a text-mined network visualization and curation system , 2016, Database J. Biol. Databases Curation.

[17]  Thomas C. Rindflesch,et al.  MedPost: a part-of-speech tagger for bioMedical text , 2004, Bioinform..

[18]  Anton J. Enright,et al.  MicroRNA targets in Drosophila , 2003, Genome Biology.

[19]  C. Croce,et al.  MicroRNA gene expression deregulation in human breast cancer. , 2005, Cancer research.

[20]  Juliane Fluck,et al.  Detecting miRNA Mentions and Relations in Biomedical Literature , 2014, F1000Research.

[21]  Karin M. Verspoor,et al.  BioC: a minimalist approach to interoperability for biomedical text processing , 2013, AMIA.

[22]  Hong-Jie Dai,et al.  Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion , 2016, Database J. Biol. Databases Curation.

[23]  D. Bartel,et al.  Weak Seed-Pairing Stability and High Target-Site Abundance Decrease the Proficiency of lsy-6 and Other miRNAs , 2011, Nature Structural &Molecular Biology.

[24]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[25]  Wei Liu,et al.  An in silico analysis of microRNAs: mining the miRNAome. , 2010, Molecular bioSystems.

[26]  J. Sulston,et al.  Isolation and genetic characterization of cell-lineage mutants of the nematode Caenorhabditis elegans. , 1980, Genetics.

[27]  Hongfang Liu,et al.  Pacific Symposium on Biocomputing 9:238-249(2004) BIOLOGICAL NOMENCLATURES: A SOURCE OF LEXICAL KNOWLEDGE AND AMBIGUITY , 2022 .

[28]  Yung-Chun Chang,et al.  Linguistic Template Extraction for Recognizing Reader-Emotion and Emotional Resonance Writing Assistance , 2015, ACL.

[29]  Martin Hofmann-Apitius,et al.  Detecting miRNA Mentions and Relations in Biomedical Literature. , 2014, F1000Research.

[30]  Shih-Hung Wu,et al.  Event identification based on the information map-INFOMAP , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[31]  Hong-Jie Dai,et al.  Micro-RNA Recognition in Patents in BioCreative V . 5 , 2017 .

[32]  Hsien-Da Huang,et al.  miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database , 2015, Nucleic Acids Res..

[33]  Yung-Chun Chang,et al.  Semantic Frame-based Statistical Approach for Topic Detection , 2014, PACLIC.

[34]  Yadong Wang,et al.  miR2Disease: a manually curated database for microRNA deregulation in human disease , 2008, Nucleic Acids Res..

[35]  Yung-Chun Chang,et al.  Linguistic Template Extraction for Recognizing Reader-Emotion , 2016, Int. J. Comput. Linguistics Chin. Lang. Process..

[36]  Francisco M. Couto,et al.  Extracting microRNA-gene relations from biomedical literature using distant supervision , 2017, PloS one.

[37]  Sam Griffiths-Jones,et al.  miRBase: the microRNA sequence database. , 2006, Methods in molecular biology.