MicroRNA-Integrated and Network-Embedded Gene Selection with Diffusion Distance

Gene network information has been used to improve gene selection in microarray-based studies by selecting marker genes based both on their expression and the coordinate expression of genes within their gene network under a given condition. Here we propose a new network-embedded gene selection model. In this model, we first address the limitations of microarray data. Microarray data, although widely used for gene selection, measures only mRNA abundance, which does not always reflect the ultimate gene phenotype, since it does not account for post-transcriptional effects. To overcome this important (critical in certain cases) but ignored-in-almost-all-existing-studies limitation, we design a new strategy to integrate together microarray data with the information of microRNA, the major post-transcriptional regulatory factor. We also handle the challenges led by gene collaboration mechanism. To incorporate the biological facts that genes without direct interactions may work closely due to signal transduction and that two genes may be functionally connected through multi paths, we adopt the concept of diffusion distance. This concept permits us to simulate biological signal propagation and therefore to estimate the collaboration probability for all gene pairs, directly or indirectly-connected, according to multi paths connecting them. We demonstrate, using type 2 diabetes (DM2) as an example, that the proposed strategies can enhance the identification of functional gene partners, which is the key issue in a network-embedded gene selection model. More importantly, we show that our gene selection model outperforms related ones. Genes selected by our model 1) have improved classification capability; 2) agree with biological evidence of DM2-association; and 3) are involved in many well-known DM2-associated pathways.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  B. Futcher,et al.  A Sampling of the Yeast Proteome , 1999, Molecular and Cellular Biology.

[3]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[4]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[7]  E. Winzeler,et al.  Protein pathway and complex clustering of correlated mRNA and protein expression analyses in Saccharomyces cerevisiae , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  M. Samanta,et al.  Predicting protein functions from redundancies in large-scale protein interaction networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  N. Rajewsky,et al.  A pancreatic islet-specific microRNA regulates insulin secretion , 2004, Nature.

[10]  Robert Veroff,et al.  A Bayesian Network Classification Methodology for Gene Expression Data , 2004, J. Comput. Biol..

[11]  Thomas Lengauer,et al.  Statistical Applications in Genetics and Molecular Biology Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data , 2011 .

[12]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: multiscale methods. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[14]  Ronald R. Coifman,et al.  Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators , 2005, NIPS.

[15]  Muller Fabbri,et al.  A MicroRNA signature associated with prognosis and progression in chronic lymphocytic leukemia. , 2005, The New England journal of medicine.

[16]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[17]  Ann B. Lee,et al.  Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  P. Petit,et al.  Protein Inhibitor of Neuronal Nitric Oxide Synthase (PIN) Is a New Regulator of Glucose-Induced Insulin Secretion , 2006, Diabetes.

[19]  Emmanuel Barillot,et al.  Classification of microarray data using gene networks , 2007, BMC Bioinformatics.

[20]  N. Rajewsky microRNA target predictions in animals , 2006, Nature Genetics.

[21]  Ramón Díaz-Uriarte,et al.  GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest , 2007, BMC Bioinformatics.

[22]  J. Kitzman,et al.  Determinants of targeting by endogenous and exogenous microRNAs and siRNAs. , 2007, RNA.

[23]  J. Steitz,et al.  Target mRNAs are repressed as efficiently by microRNA-binding sites in the 5′ UTR as in the 3′ UTR , 2007, Proceedings of the National Academy of Sciences.

[24]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[25]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[26]  Reuven Agami,et al.  miR-148 targets human DNMT3b protein coding region. , 2008, RNA.

[27]  M. Khoury,et al.  A navigator for human genome epidemiology , 2008, Nature Genetics.

[28]  Jian Huang,et al.  Penalized feature selection and classification in bioinformatics , 2008, Briefings Bioinform..

[29]  Michael B Wheeler,et al.  The Identification of Potential Factors Associated with the Development of Type 2 Diabetes , 2008, Molecular & Cellular Proteomics.

[30]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[31]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[32]  H. Holthofer,et al.  Epigenetic and microRNA-mediated regulation in diabetes , 2008, Nephrology, dialysis, transplantation : official publication of the European Dialysis and Transplant Association - European Renal Association.