Missing value imputation for microRNA expression data by using a GO-based similarity measure

BackgroundMissing values are commonly present in microarray data profiles. Instead of discarding genes or samples with incomplete expression level, missing values need to be properly imputed for accurate data analysis. The imputation methods can be roughly categorized as expression level-based and domain knowledge-based. The first type of methods only rely on expression data without the help of external data sources, while the second type incorporates available domain knowledge into expression data to improve imputation accuracy.In recent years, microRNA (miRNA) microarray has been largely developed and used for identifying miRNA biomarkers in complex human disease studies. Similar to mRNA profiles, miRNA expression profiles with missing values can be treated with the existing imputation methods. However, the domain knowledge-based methods are hard to be applied due to the lack of direct functional annotation for miRNAs. With the rapid accumulation of miRNA microarray data, it is increasingly needed to develop domain knowledge-based imputation algorithms specific to miRNA expression profiles to improve the quality of miRNA data analysis.ResultsWe connect miRNAs with domain knowledge of Gene Ontology (GO) via their target genes, and define miRNA functional similarity based on the semantic similarity of GO terms in GO graphs. A new measure combining miRNA functional similarity and expression similarity is used in the imputation of missing values. The new measure is tested on two miRNA microarray datasets from breast cancer research and achieves improved performance compared with the expression-based method on both datasets.ConclusionsThe experimental results demonstrate that the biological domain knowledge can benefit the estimation of missing values in miRNA profiles as well as mRNA profiles. Especially, functional similarity defined by GO terms annotated for the target genes of miRNAs can be useful complementary information for the expression-based method to improve the imputation accuracy of miRNA array data. Our method and data are available to the public upon request.

[1]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[2]  Ying Xu,et al.  Prediction of functional modules based on comparative genome analysis and Gene Ontology application , 2005, Nucleic acids research.

[3]  C. Croce,et al.  A microRNA expression signature of human solid tumors defines cancer gene targets , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[5]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[6]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[7]  Kenneth H. Buetow,et al.  Gene functional similarity search tool (GFSST) , 2006, BMC Bioinformatics.

[8]  Angel Rubio,et al.  Correlation between gene expression and GO semantic similarity , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Dong Wang,et al.  Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases , 2010, Bioinform..

[10]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[11]  Stijn van Dongen,et al.  miRBase: microRNA sequences, targets and gene nomenclature , 2005, Nucleic Acids Res..

[12]  P. Deloukas,et al.  A Gene Map of the Human Genome , 1996, Science.

[13]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[14]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[15]  Ruedi Aebersold,et al.  Identification of androgen-coregulated protein networks from the microsomes of human prostate cancer cells , 2003, Genome Biology.

[16]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[17]  K. Gunsalus,et al.  Combinatorial microRNA target predictions , 2005, Nature Genetics.

[18]  Qing-Yu He,et al.  A new method for measuring functional similarity of microRNAs , 2011 .

[19]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[20]  Eytan Domany,et al.  Atom-efficient synthesis of 2,4,6-trisubstituted 1,3,5-triazines via Fe-catalyzed cyclization of aldehydes with NH4I as the sole nitrogen source , 2020, RSC advances.

[21]  Chiara Romualdi,et al.  miR148b is a major coordinator of breast cancer progression in a relapse‐associated microRNA signature by targeting ITGA5, ROCK1, PIK3CA, NRAS, and CSF1 , 2013, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[22]  C. Burge,et al.  Prediction of Mammalian MicroRNA Targets , 2003, Cell.

[23]  Jiang Wang,et al.  Missing value imputation for microarray gene expression data using histone acetylation information , 2008, BMC Bioinformatics.

[24]  Wang Zheng-zhi Improving Missing Value of DNA Microarray Data by Using Protein-Protein Interactions , 2008 .

[25]  A. Hatzigeorgiou,et al.  TarBase: A comprehensive database of experimentally supported animal microRNA targets. , 2005, RNA.

[26]  Yibo Wu,et al.  GOSemSim: an R package for measuring semantic similarity among GO terms and gene products , 2010, Bioinform..

[27]  Anton J. Enright,et al.  Human MicroRNA Targets , 2004, PLoS biology.

[28]  Yadong Wang,et al.  Towards integrative gene functional similarity measurement , 2014, BMC Bioinformatics.

[29]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[30]  Anton J. Enright,et al.  MicroRNA targets in Drosophila , 2003, Genome Biology.

[31]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[32]  Xiaobo Zhou,et al.  Missing-value estimation using linear and non-linear regression with Bayesian gene selection , 2003, Bioinform..

[33]  Tero Aittokallio,et al.  Improving missing value estimation in microarray data with gene ontology , 2006, Bioinform..