Classification approaches for microarray gene expression data analysis.

Classification approaches have been developed, adopted, and applied to distinguish disease classes at the molecular level using microarray data. Recently, a novel class of hierarchical probabilistic models based on a kernel-imbedding technique has become one of the best classification tools for microarray data analysis. These models were first developed as kernel-imbedded Gaussian processes (KIGPs) for binary class classification problems using microarray gene expression data, then they were further improved for multiclass classification problems under a unifying Bayesian framework. Specifically, an adaptive algorithm with a cascading structure was designed to find appropriate featuring kernels, to discover potentially significant genes, and to make optimal disease (e.g., tumor/cancer) class predictions with associated Bayesian posterior probabilities. Simulation studies and applications to publish real data showed that KIGPs performed very close to the Bayesian bound and consistently outperformed or performed among the best of a lot of state-of-the-art methods. The most unique advantage of the KIGP approach is its ability to explore both the linear and the nonlinear underlying relationships between the target features of a given disease classification problem and the involved explanatory gene expression data. This line of research has shed light on the broader usability of the KIGP approach for the analysis of other high-throughput omics data and omics data collected in time series fashion, especially when linear model based methods fail to work.

[1]  Grégory Nuel,et al.  Effective p-value computations using Finite Markov Chain Imbedding (FMCI): application to local score and to pattern statistics , 2006, Algorithms for Molecular Biology.

[2]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[4]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[5]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[6]  Stephen T. C. Wong,et al.  Cancer classification and prediction using logistic regression with Bayesian gene selection , 2004, J. Biomed. Informatics.

[7]  Huilin Xiong,et al.  Kernel-based distance metric learning for microarray data classification , 2006, BMC Bioinformatics.

[8]  Xin Zhao,et al.  Multiclass Kernel-Imbedded Gaussian Processes for Microarray Data Analysis , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Xiaobo Zhou,et al.  Gene Prediction Using Multinomial Probit Regression with Bayesian Gene Selection , 2004, EURASIP J. Adv. Signal Process..

[10]  Dale H. Mugler,et al.  A gene selection method for classifying cancer samples using 1D discrete wavelet transform , 2009, Int. J. Comput. Biol. Drug Des..

[11]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[13]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[14]  James T. Kwok,et al.  The evidence framework applied to support vector machines , 2000, IEEE Trans. Neural Networks Learn. Syst..

[15]  Wei Chu,et al.  Biomarker discovery in microarray gene expression data with Gaussian processes , 2005, Bioinform..

[16]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[17]  Donald E. K. Martin,et al.  Distributions associated with general runs and patterns in hidden Markov models , 2007, 0706.3985.

[18]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[19]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[20]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[21]  TamilSelvi Madeswaran,et al.  A COMPARATIVE ANALYSIS OF CLASSIFICATION OF MICRO ARRAY GENE EXPRESSION DATA USING DIMENSIONALITY REDUCTION TECHNIQUES , 2012 .

[22]  Yashwant Prasad Singh,et al.  Adaboost and SVM based cybercrime detection and prevention model , 2012, Artif. Intell. Res..

[23]  Wenyan Zhong Feature selection for cancer classification using microarray gene expression data , 2014 .

[24]  Matthew B. Avison Measuring Gene Expression , 2006 .

[25]  Ping Xu,et al.  Modified linear discriminant analysis approaches for classification of high-dimensional microarray data , 2009, Comput. Stat. Data Anal..

[26]  Ji Zhu,et al.  Variable Selection for the Linear Support Vector Machine , 2007, Trends in Neural Computation.

[27]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[28]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[29]  Anne-Claude Camproux,et al.  Finite Markov Chain Embedding for the Exact Distribution of Patterns in a Set of Random Sequences , 2010 .

[30]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[31]  Xiaobo Zhou,et al.  A Bayesian approach to nonlinear probit gene selection and classification , 2004, J. Frankl. Inst..

[32]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[33]  Jacques Cohen,et al.  Bioinformatics—an introduction for computer scientists , 2004, CSUR.

[34]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[35]  Xin Zhao,et al.  Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data , 2007, BMC Bioinformatics.

[36]  Shaik Abdul,et al.  SVM Classification and Analysis of Margin Distance on Microarray Data , 2011 .

[37]  Anthony Kuh,et al.  Least Squares Kernel Methods and Applications , 2004 .

[38]  Derek Y. Chiang,et al.  Focal gains of VEGFA and molecular classification of hepatocellular carcinoma. , 2008, Cancer research.

[39]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[40]  Sayan Mukherjee,et al.  Classifying Microarray Data Using Support Vector Machines , 2003 .

[41]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[42]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[43]  O. Schneewind,et al.  Substrate recognition of type III secretion machines –testing the RNA signal hypothesis , 2005, Cellular microbiology.

[44]  Stuart Aitken,et al.  Mining housekeeping genes with a Naive Bayes classifier , 2006, BMC Genomics.

[45]  Mihir S. Sewak Application of Committee Neural Networks for Gene Expression Based Leukemia Classification , 2008 .

[46]  J. Phan,et al.  Improvement of SVM Algorithm for Microarray Analysis Using Intelligent Parameter Selection , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[47]  Sheau-Ling Hsieh,et al.  Leukemia cancer classification based on Support Vector Machine , 2010, 2010 8th IEEE International Conference on Industrial Informatics.

[48]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[49]  S. Drăghici,et al.  Analysis of microarray experiments of gene expression profiling. , 2006, American journal of obstetrics and gynecology.

[50]  Rakesh Choudary Malepati Classific ation algorithms for genomic microarray , 2010 .

[51]  Shweta Seeja K.R. Microarray Data Classification Using Support Vector Machine , 2011 .

[52]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[53]  Leo Wang-Kit Cheung,et al.  Use of Runs Statistics for Pattern Recognition in Genomic DNA Sequences , 2004, J. Comput. Biol..

[54]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[55]  Johan A. K. Suykens,et al.  Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction , 2004, Bioinform..

[56]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[57]  L. O’Driscoll Gene Expression Profiling , 2011, Methods in Molecular Biology.

[58]  Jeffrey D. Scargle,et al.  Statistical challenges in modern astronomy II , 1997 .

[59]  J. Downing,et al.  Classification of pediatric acute lymphoblastic leukemia by gene expression profiling. , 2003, Blood.

[60]  A Thesis APPLICATION OF COMMITTEE k-NN CLASSIFIERS FOR GENE EXPRESSION PROFILE CLASSIFICATION , 2008 .

[61]  Stefano Toppo,et al.  Pattern recognition in gene expression profiling using DNA array: a comparative study of different statistical methods applied to cancer classification. , 2003, Human molecular genetics.

[62]  Jill P. Mesirov,et al.  Support Vector Machine Classification of Microarray Data , 2001 .

[63]  Marie Joseph,et al.  Gene Signatures of Progression and Metastasis in Renal Cell Cancer , 2005, Clinical Cancer Research.

[64]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[65]  Johan A. K. Suykens,et al.  Bayesian Framework for Least-Squares Support Vector Machine Classifiers, Gaussian Processes, and Kernel Fisher Discriminant Analysis , 2002, Neural Computation.

[66]  Giuseppe Basso,et al.  MLL rearrangements in pediatric acute lymphoblastic and myeloblastic leukemias: MLL specific and lineage specific signatures , 2009, BMC Medical Genomics.

[67]  Aadhithya Vishnampettai Sridhar A Hybrid Classifier Committee Approach for Microarray Sample Classification , 2011 .

[68]  Jianzhong Li,et al.  A stable gene selection in microarray data analysis , 2006, BMC Bioinformatics.

[69]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[70]  Pablo Tamayo,et al.  A strategy for oligonucleotide microarray probe reduction , 2002, Genome Biology.

[71]  Ryuzo Azuma,et al.  Particle simulation approach for subcellular dynamics and interactions of biological molecules , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[72]  Tom Britton,et al.  Hierarchical Bayes models for cDNA microarray gene expression. , 2005, Biostatistics.

[73]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..