Data Integration for gene expression prediction

In computational system biology, one challenging topic is predicting the exact value of gene expression for further meta-analysis. For this, a data integration approach and regression based task are proposed. To improve prediction performance, gene expression data consisted of continuous value is integrated with binary data from miRNA-mRNA regulation pairs by a simple approach. For regression task, a recently introduced method, Relevance Vector Machine (RVM) and linear regression are used. For evaluation, Spearman and Pearson Correlation Coefficients, and Root Mean Squared Error are used. The results we obtain show that the proposed approach can significantly improve the prediction performance. Data integration approach and RVM are promising in many machine learning problems.

[1]  Gerhard Tutz,et al.  Improved methods for the imputation of missing data by nearest neighbor methods , 2015, Comput. Stat. Data Anal..

[2]  Panayiotis V. Benos,et al.  mirConnX: condition-specific mRNA-microRNA network integrator , 2011, Nucleic Acids Res..

[3]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[4]  Ville Ollikainen,et al.  A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data , 2015, Knowl. Based Syst..

[5]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[6]  D. Wheeler,et al.  Genomic analysis of hepatoblastoma identifies distinct molecular and prognostic subgroups , 2017, Hepatology.

[7]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[8]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[9]  C. Greenwood,et al.  Data Integration in Genetics and Genomics: Methods and Challenges , 2009, Human genomics and proteomics : HGP.

[10]  Nataša Pržulj,et al.  Methods for biological data integration: perspectives and challenges , 2015, Journal of The Royal Society Interface.

[11]  S. Datta,et al.  Modeling microRNA-mRNA Interactions Using PLS Regression in Human Colon Cancer , 2011, BMC Medical Genomics.

[12]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[13]  Duygu Dede,et al.  TriClust: A Tool for Cross‐Species Analysis of Gene Regulation , 2014, Molecular informatics.

[14]  Ziv Bar-Joseph,et al.  Integrating sequence, expression and interaction data to determine condition-specific miRNA regulation , 2013, Bioinform..

[15]  Hasan Ogul,et al.  miSEA: microRNA set enrichment analysis , 2015, Biosyst..

[16]  H. Oğul,et al.  MicroRNA expression prediction: Regression from regulatory elements , 2016 .

[17]  Lei Guo,et al.  Predicting Gene Expression from Sequence: A Reexamination , 2007, PLoS Comput. Biol..

[18]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[19]  Lili Jiang,et al.  A global learning with local preservation method for microarray data imputation , 2016, Comput. Biol. Medicine.

[20]  Yi Zhao,et al.  Modeling miRNA-mRNA interactions: fitting chemical kinetics equations to microarray data , 2014, BMC Systems Biology.