Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts

BackgroundThe amount of scientific information about MicroRNAs (miRNAs) is growing exponentially, making it difficult for researchers to interpret experimental results. In this study, we present an automated text mining approach using Latent Semantic Indexing (LSI) for prioritization, clustering and functional annotation of miRNAs.ResultsFor approximately 900 human miRNAs indexed in miRBase, text documents were created by concatenating titles and abstracts of MEDLINE citations which refer to the miRNAs. The documents were parsed and a weighted term-by-miRNA frequency matrix was created, which was subsequently factorized via singular value decomposition to extract pair-wise cosine values between the term (keyword) and miRNA vectors in reduced rank semantic space. LSI enables derivation of both explicit and implicit associations between entities based on word usage patterns. Using miR2Disease as a gold standard, we found that LSI identified keyword-to-miRNA relationships with high accuracy. In addition, we demonstrate that pair-wise associations between miRNAs can be used to group them into categories which are functionally aligned. Finally, term ranking by querying the LSI space with a group of miRNAs enabled annotation of the clusters with functionally related terms.ConclusionsLSI modeling of MEDLINE abstracts provides a robust and automated method for miRNA related knowledge discovery. The latest collection of miRNA abstracts and LSI model can be accessed through the web tool miRNA Literature Network (miRLiN) at http://bioinfo.memphis.edu/mirlin.

[1]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Spectral methods for graph clustering - A survey , 2011, Eur. J. Oper. Res..

[2]  Ralf Zimmer,et al.  miRSel: Automated extraction of associations between microRNAs and genes from the biomedical literature , 2010, BMC Bioinformatics.

[3]  Hao Chen,et al.  Content-rich biological network constructed by mining PubMed abstracts , 2004, BMC Bioinformatics.

[4]  Peter T. Nelson,et al.  MicroRNAs (miRNAs) in Neurodegenerative Diseases , 2008, Brain pathology.

[5]  Cathy H. Wu,et al.  miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases , 2016, Journal of Biomedical Semantics.

[6]  Vipin Kumar UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS , 2006 .

[7]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[8]  K. Bretonnel Cohen,et al.  Manual curation is not sufficient for annotation of genomic databases , 2007, ISMB/ECCB.

[9]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[10]  C E Lipscomb,et al.  Medical Subject Headings (MeSH). , 2000, Bulletin of the Medical Library Association.

[11]  T. Osborne,et al.  miRNA and cholesterol homeostasis. , 2016, Biochimica et biophysica acta.

[12]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Stefanie Dimmeler,et al.  Role of microRNAs in vascular diseases, inflammation, and angiogenesis. , 2008, Cardiovascular research.

[14]  C. Fernández-Hernando,et al.  miRNA regulation of white and brown adipose tissue differentiation and function. , 2016, Biochimica et biophysica acta.

[15]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[16]  K. Rayner,et al.  Macrophage miRNAs in atherosclerosis. , 2016, Biochimica et biophysica acta.

[17]  Mu Zhu,et al.  Automatic dimensionality selection from the scree plot via the use of profile likelihood , 2006, Comput. Stat. Data Anal..

[18]  Yong Huang,et al.  Biological functions of microRNAs: a review , 2011, Journal of Physiology and Biochemistry.

[19]  Michael W. Berry,et al.  Latent Semantic Indexing of PubMed abstracts for identification of transcription factor candidates from microarray derived gene sets , 2011, BMC Bioinformatics.

[20]  Efstratios Gallopoulos,et al.  TMG: A MATLAB Toolbox for Generating Term-Document Matrices from Text Collections , 2006, Grouping Multidimensional Data.

[21]  Alexander Isaev,et al.  PyEvolve: a toolkit for statistical modelling of molecular evolution , 2004, BMC Bioinformatics.

[22]  Jonathan D. Wren,et al.  Clustering microarray-derived gene lists through implicit literature relationships , 2007, Bioinform..

[23]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[24]  Michael W. Berry,et al.  Gene clustering by Latent Semantic Indexing of MEDLINE abstracts , 2005, Bioinform..

[25]  Y. Suárez,et al.  MicroRNAs as regulators of endothelial cell functions in cardiometabolic diseases. , 2016, Biochimica et biophysica acta.

[26]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[27]  Gene H. Golub,et al.  Matrix computations , 1983 .

[28]  Erik M. van Mulligen,et al.  Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes , 2005, Bioinform..

[29]  Paul S. Foster,et al.  Targeting MicroRNA Function in Respiratory Diseases: Mini-Review , 2016, Front. Physiol..

[30]  Yadong Wang,et al.  miR2Disease: a manually curated database for microRNA deregulation in human disease , 2008, Nucleic Acids Res..

[31]  Á. Baldán,et al.  miRNAs and High-Density Lipoprotein metabolism. , 2016, Biochimica et biophysica acta.

[32]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[33]  C. Croce,et al.  MicroRNAs in Cancer. , 2009, Annual review of medicine.

[34]  Michael W. Berry,et al.  Understanding search engines: mathematical modeling and text retrieval (software , 1999 .

[35]  Gerard Salton,et al.  The smart document retrieval project , 1991, SIGIR '91.

[36]  John Sidney,et al.  An ontology for major histocompatibility restriction , 2016, Journal of Biomedical Semantics.

[37]  Yifan Peng,et al.  miRTex: A Text Mining System for miRNA-Gene Relation Extraction , 2015, PLoS Comput. Biol..

[38]  Di Wu,et al.  miRCancer: a microRNA-cancer association database constructed by text mining on literature , 2013, Bioinform..