Cancer Classification from Gene Expression Data by NPPC Ensemble

The most important application of microarray in gene expression analysis is to classify the unknown tissue samples according to their gene expression levels with the help of known sample expression levels. In this paper, we present a nonparallel plane proximal classifier (NPPC) ensemble that ensures high classification accuracy of test samples in a computer-aided diagnosis (CAD) framework than that of a single NPPC model. For each data set only, a few genes are selected by using a mutual information criterion. Then a genetic algorithm-based simultaneous feature and model selection scheme is used to train a number of NPPC expert models in multiple subspaces by maximizing cross-validation accuracy. The members of the ensemble are selected by the performance of the trained models on a validation set. Besides the usual majority voting method, we have introduced minimum average proximity-based decision combiner for NPPC ensemble. The effectiveness of the NPPC ensemble and the proposed new approach of combining decisions for cancer diagnosis are studied and compared with support vector machine (SVM) classifier in a similar framework. Experimental results on cancer data sets show that the NPPC ensemble offers comparable testing accuracy to that of SVM ensemble with reduced training time on average.

[1]  Mayumi Ono,et al.  Expression of HER2 and estrogen receptor alpha depends upon nuclear localization of Y-box binding protein-1 in human breast cancers. , 2008, Cancer research.

[2]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[3]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[4]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[5]  Brian J. Wilson,et al.  GATA3 inhibits breast cancer growth and pulmonary breast cancer metastasis , 2009, Oncogene.

[6]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[7]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[8]  Alex Lewin,et al.  A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments , 2004, Bioinform..

[9]  Robert G. Ramsay,et al.  MYB function in normal and cancer cells , 2008, Nature Reviews Cancer.

[10]  Hans-Peter Kriegel,et al.  Class Prediction from Time Series Gene Expression Profiles Using Dynamical Systems Kernels , 2005, Pacific Symposium on Biocomputing.

[11]  Il-Seok Oh,et al.  Classifier ensemble selection using hybrid genetic algorithms , 2008, Pattern Recognit. Lett..

[12]  Michael I. Jordan,et al.  Simultaneous classification and relevant feature identification in high-dimensional spaces: application to molecular profiling data , 2003, Signal Process..

[13]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[14]  R Kip Guy,et al.  Gene expression as a drug discovery tool , 2004, Nature Genetics.

[15]  Samuel Leung,et al.  Redefining prognostic factors for breast cancer: YB-1 is a stronger predictor of relapse and disease-specific survival than estrogen receptor or HER-2 across all tumor subtypes , 2008, Breast Cancer Research.

[16]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[17]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[18]  Galina L. Rogova,et al.  Combining the results of several neural network classifiers , 1994, Neural Networks.

[19]  Richard J. Enbody,et al.  Further Research on Feature Selection and Classification Using Genetic Algorithms , 1993, ICGA.

[20]  Ling Jing,et al.  Detection of Horizontal Gene Transfer in Bacterial Genomes , 2009 .

[21]  W. Lam,et al.  The transcriptional induction of PIK3CA in tumor cells is dependent on the oncoprotein Y-box binding protein-1 , 2009, Oncogene.

[22]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[23]  Michael L. Bittner,et al.  Strong Feature Sets from Small Samples , 2002, J. Comput. Biol..

[24]  P. Brown,et al.  Exploring drug-induced alterations in gene expression in Mycobacterium tuberculosis by microarray hybridization. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[26]  Michael I. Jordan,et al.  Simultaneous Relevant Feature Identification and Classification in High-Dimensional Spaces , 2002, WABI.

[27]  Glenn Fung,et al.  Data selection for support vector machine classifiers , 2000, KDD '00.

[28]  Debashis Ghosh,et al.  Classification and Selection of Biomarkers in Genomic Data Using LASSO , 2005, Journal of biomedicine & biotechnology.

[29]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[30]  A. Schulze,et al.  Navigating gene expression using microarrays — a technology review , 2001, Nature Cell Biology.

[31]  V. Roth The Generalized LASSO: a wrapper approach to gene selection for microarray data , 2002 .

[32]  Anirban Mukherjee,et al.  Newton's method for nonparallel plane proximal classifier with unity norm hyperplanes , 2010, Signal Process..

[33]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[34]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[35]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[36]  Wei Xie,et al.  Accurate Cancer Classification Using Expressions of Very Few Genes , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[37]  Zhenbing Zeng,et al.  Multiple classifier integration for the prediction of protein structural classes , 2009, J. Comput. Chem..

[38]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[39]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[40]  G. Bontempi,et al.  A Blocking Strategy to Improve Gene Selection for Classification of Gene Expression Data , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  Andreas Schneeweiss,et al.  Pairwise relationships between relative mRNA expression levels of ERR α and selected receptors , cofactors , and target genes in breast cancer tissues ( tumor set I , n = 48 ) , 2009 .

[42]  Ulisses Braga-Neto,et al.  Reliable Classifier to Differentiate Primary and Secondary Acute Dengue Infection Based on IgG ELISA , 2009, PloS one.

[43]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[44]  Georgia Chenevix-Trench,et al.  Frequent somatic mutations of GATA3 in non-BRCA1/BRCA2 familial breast tumors, but not in BRCA1-, BRCA2- or sporadic breast tumors , 2008, Breast Cancer Research and Treatment.

[45]  Chang Wook Ahn,et al.  On the practical genetic algorithms , 2005, GECCO '05.

[46]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[47]  Reshma Khemchandani,et al.  Twin Support Vector Machines for Pattern Classification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Jan Terje Kvaløy,et al.  A small subgroup of operable breast cancer patients with poor prognosis identified by quantitative real-time RT-PCR detection of mammaglobin A and trefoil factor 1 mRNA expression in bone marrow , 2009, Breast Cancer Research and Treatment.

[49]  Lawrence Carin,et al.  Joint classifier and feature optimization for cancer diagnosis using gene expression data , 2003, RECOMB '03.

[50]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[52]  Ronald M. Summers,et al.  Optimizing the support vector machines (SVM) committee configuration in a colonic polyp CAD system , 2005, SPIE Medical Imaging.

[53]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Anil K. Jain,et al.  Bayesian learning of sparse classifiers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[55]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[56]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[57]  Jenq-Neng Hwang,et al.  Handbook of Neural Network Signal Processing , 2000, IEEE Transactions on Neural Networks.

[58]  M. Cutler,et al.  Ectopic expression of Rsu-1 results in elevation of p21CIP and inhibits anchorage-independent growth of MCF7 breast cancer cells , 2000, Breast Cancer Research and Treatment.

[59]  Dominique Martinez,et al.  Support Vector Committee Machines , 2000, ESANN.

[60]  Paul D. Minton,et al.  Statistics: The Exploration and Analysis of Data , 2002, Technometrics.

[61]  C. Clevenger,et al.  Role of c-Myb during prolactin-induced signal transducer and activator of transcription 5a signaling in breast cancer cells. , 2009, Endocrinology.

[62]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[63]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[64]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[65]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[66]  D. Botstein,et al.  Gene expression patterns in human liver cancers. , 2002, Molecular biology of the cell.

[67]  Ya Zhang,et al.  Data-Dependent Kernel Machines for Microarray Data Classification , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[68]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[69]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[70]  Li Shen,et al.  Dimension reduction-based penalized logistic regression for cancer classification using microarray data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[71]  P. Goodfellow,et al.  DNA microarrays in drug discovery and development , 1999, Nature Genetics.

[72]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[73]  Yanqing Zhang,et al.  Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[74]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[75]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[76]  Ed Keedwell,et al.  Discovering gene networks with a neural-genetic hybrid , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[77]  Ching Y. Suen,et al.  A Method of Combining Multiple Experts for the Recognition of Unconstrained Handwritten Numerals , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[78]  Olvi L. Mangasarian,et al.  Multisurface proximal support vector machine classification via generalized eigenvalues , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Jorma Laaksonen,et al.  Using diversity of errors for selecting members of a committee classifier , 2006, Pattern Recognit..

[80]  B. Lerner,et al.  On the Classification of a Small Imbalanced Cytogenetic Image Database , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[81]  Liang Chen,et al.  A statistical method for identifying differential gene-gene co-expression patterns , 2004, Bioinform..

[82]  Anirban Mukherjee,et al.  Nonparallel plane proximal classifier , 2009, Signal Process..

[83]  Yi Zhang,et al.  Genes associated with breast cancer metastatic to bone. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[84]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[85]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.