Cancer classification based on microarray gene expression data using a principal component accumulation method

The classification of cancer is a major research topic in bioinformatics. The nature of high dimensionality and small size associated with gene expression data, however, makes the classification quite challenging. Although principal component analysis (PCA) is of particular interest for the high-dimensional data, it may overemphasize some aspects and ignore some other important information contained in the richly complex data, because it displays only the difference in the first two- or three-dimensional PC subspaces. Based on PCA, a principal component accumulation (PCAcc) method was proposed. It employs the information contained in multiple PC subspaces and improves the class separability of cancers. The effectiveness of the present method was evaluated by four commonly used gene expression datasets, and the results show that the method performs well for cancer classification.

[1]  Gavin C. Cawley,et al.  Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[2]  Aaron M. Newman,et al.  AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number , 2010, BMC Bioinformatics.

[3]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[5]  András Kocsor,et al.  Kalman filtering for disease-state estimation from microarray data , 2006, Bioinform..

[6]  Weida Tong,et al.  Consensus analysis of multiple classifiers using non-repetitive variables: Diagnostic application to microarray gene expression data , 2007, Comput. Biol. Chem..

[7]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[8]  I. Jolliffe Principal Component Analysis , 2002 .

[9]  J. M. Deutsch,et al.  Evolutionary algorithms for finding optimal gene sets in microarray prediction , 2003, Bioinform..

[10]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification and gene selection , 2008, Bioinform..

[11]  Holger Sültmann,et al.  Global gene expression analysis reveals specific patterns of cell junctions in non-small cell lung cancer subtypes. , 2009, Lung cancer.

[12]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[13]  Ying Zhang,et al.  Block principal component analysis with application to gene microarray data classification , 2002, Statistics in medicine.

[14]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[15]  Daniel Q. Naiman,et al.  Simple decision rules for classifying human cancers from gene expression profiles , 2005, Bioinform..

[16]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[17]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[18]  Xiaosheng Wang,et al.  Accurate molecular classification of cancer using simple rules , 2009, BMC Medical Genomics.

[19]  Farid E Ahmed,et al.  Molecular Cancer BioMed Central Review , 2005 .

[20]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[21]  Xing Qiu,et al.  The effects of normalization on the correlation structure of microarray data , 2005, BMC Bioinformatics.

[22]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[23]  Jukka Corander,et al.  Bayesian clustering and feature selection for cancer tissue samples , 2009, BMC Bioinformatics.

[24]  Yuan Ren,et al.  Classification for high-throughput data with an optimal subset of principal components , 2009, Comput. Biol. Chem..

[25]  Wen Du,et al.  New Variable Selection Method Using Interval Segmentation Purity with Application to Blockwise Kernel Transform Support Vector Machine Classification of High-Dimensional Microarray Data , 2009, J. Chem. Inf. Model..

[26]  Hong-Wen Deng,et al.  Gene selection for classification of microarray data based on the Bayes error , 2007, BMC Bioinformatics.

[27]  Daniel A. Ashlock,et al.  MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering , 2009, BMC Bioinformatics.

[28]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[29]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[30]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[31]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[32]  G. Bhanot,et al.  Analysis of breast cancer progression using principal component analysis and clustering , 2007, Journal of Biosciences.

[33]  Heping Zhang,et al.  Cell and tumor classification using gene expression data: Construction of forests , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[34]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[35]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[36]  Christophe Lemetre,et al.  An introduction to artificial neural networks in bioinformatics - application to complex microarray and mass spectrometry datasets in cancer studies , 2008, Briefings Bioinform..