Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification.

The identification of disease-relevant genes represents a challenge in microarray-based disease diagnosis where the sample size is often limited. Among established methods, reversible jump Markov Chain Monte Carlo (RJMCMC) methods have proven to be quite promising for variable selection. However, the design and application of an RJMCMC algorithm requires, for example, special criteria for prior distributions. Also, the simulation from joint posterior distributions of models is computationally extensive, and may even be mathematically intractable. These disadvantages may limit the applications of RJMCMC algorithms. Therefore, the development of algorithms that possess the advantages of RJMCMC methods and are also efficient and easy to follow for selecting disease-associated genes is required. Here we report a RJMCMC-like method, called random frog that possesses the advantages of RJMCMC methods and is much easier to implement. Using the colon and the estrogen gene expression datasets, we show that random frog is effective in identifying discriminating genes. The top 2 ranked genes for colon and estrogen are Z50753, U00968, and Y10871_at, Z22536_at, respectively. (The source codes with GNU General Public License Version 2.0 are freely available to non-commercial users at: http://code.google.com/p/randomfrog/.).

[1]  Johan A. K. Suykens,et al.  Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction , 2004, Bioinform..

[2]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[3]  Carl Virtanen,et al.  Integrated classification of lung tumors and cell lines by expression profiling , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  U. Alon,et al.  Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. , 2001, Cancer research.

[5]  Gavin C. Cawley,et al.  Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[6]  Minoru Toyota,et al.  Integrated genetic and epigenetic analysis identifies three different subclasses of colon cancer , 2007, Proceedings of the National Academy of Sciences.

[7]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[8]  Dong-Sheng Cao,et al.  Recipe for uncovering predictive genes using support vector machines based on model population analysis , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Proceedings of the German Conference on Bioinformatics, GCB 2003, October 12-14, 2003, Neuherberg/Garching near Munich, Germany , 2003, German Conference on Bioinformatics.

[10]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[11]  D Williamson,et al.  Comparative expressed sequence hybridization to chromosomes for tumor classification and identification of genomic regions of differential gene expression , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[13]  P. Filzmoser,et al.  Repeated double cross validation , 2009 .

[14]  Koon-wing Chan,et al.  Suppression of the tumorigenicity of mutant p53-transformed rat embryo fibroblasts through expression of a newly cloned rat nonmuscle myosin heavy chain-B , 2001, Oncogene.

[15]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[16]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[17]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.

[18]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[19]  S. Dhanasekaran,et al.  Delineation of prognostic biomarkers in prostate cancer , 2001, Nature.

[20]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[21]  K. Shailubhai,et al.  Uroguanylin treatment suppresses polyp formation in the Apc(Min/+) mouse and induces apoptosis in human colon adenocarcinoma cells via cyclic GMP. , 2000, Cancer research.

[22]  Jian Huang,et al.  Regularized ROC method for disease classification and biomarker selection with microarray data , 2005, Bioinform..

[23]  K. J. Ray Liu,et al.  Dependence network modeling for biomarker identification , 2007, Bioinform..

[24]  Xiaoxing Liu,et al.  An Entropy-based gene selection method for cancer classification using microarray data , 2005, BMC Bioinformatics.

[25]  Dong-Sheng Cao,et al.  Model-population analysis and its applications in chemical and biological modeling , 2012 .

[26]  Dong-Sheng Cao,et al.  Model population analysis for variable selection , 2010 .

[27]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[28]  Yang Ai-jun,et al.  Bayesian variable selection for disease classification using gene expression data , 2010 .

[29]  J. Brezmes,et al.  Variable selection for support vector machine based multisensor systems , 2007 .

[30]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[31]  Dong-Sheng Cao,et al.  Recipe for revealing informative metabolites based on model population analysis , 2010, Metabolomics.

[32]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[33]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[34]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[35]  BMC Bioinformatics , 2005 .

[36]  Feng Luan,et al.  Support vector machine and the heuristic method to predict the solubility of hydrocarbons in electrolyte. , 2005, The journal of physical chemistry. A.

[37]  M. Vannucci,et al.  Bayesian Variable Selection in Clustering High-Dimensional Data , 2005 .

[38]  Paola Sebastiani,et al.  Conditional clustering of temporal expression profiles , 2008, BMC Bioinformatics.

[39]  F. Ausubel Metabolomics , 2012, Nature Biotechnology.

[40]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[41]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[42]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[43]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.