Gene selection using independent variable group analysis for tumor classification

Microarrays are capable of detecting the expression levels of thousands of genes simultaneously. So, gene expression data from DNA microarray are characterized by many measured variables (genes) on only a few samples. One important application of gene expression data is to classify the samples. In statistical terms, the very large number of predictors or variables compared to small number of samples makes most of classical “class prediction” methods unemployable. Generally, this problem can be avoided by selecting only the relevant features or extracting new features containing the maximal information about the class label from the original data. In this paper, a new method for gene selection based on independent variable group analysis is proposed. In this method, we first used t-statistics method to select a part of genes from the original data. Then, we selected the key genes from the selected genes for tumor classification using IVGA. Finally, we used SVM to classify tumors based on the key genes selected using IVGA. To validate the efficiency, the proposed method is applied to classify three different DNA microarray data sets. The prediction results show that our method is efficient and feasible.

[1]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[4]  Soheil Shams,et al.  Noise Sampling Method: An ANOVA Approach Allowing Robust Selection of Differentially Regulated Genes Measured by DNA Microarrays , 2003, Bioinform..

[5]  Antti Honkela,et al.  Independent Variable Group Analysis in Learning Compact Representations for Data , 2005 .

[6]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[7]  King-Sun Fu,et al.  Handbook of pattern recognition and image processing , 1986 .

[8]  Cinzia Viroli,et al.  Variable Selection in Cell Classification Problems: A Strategy Based on Independent Component Analysis , 2005 .

[9]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[10]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[11]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[12]  K. Aihara,et al.  Uncovering signal transduction networks from high-throughput data by integer linear programming , 2008, Nucleic acids research.

[13]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[14]  J. Devore,et al.  Statistics: The Exploration and Analysis of Data , 1986 .

[15]  Ivo Grosse,et al.  Extreme Value Distribution Based Gene Selection Criteria for Discriminant Microarray Data Analysis Using Logistic Regression , 2004, J. Comput. Biol..

[16]  Alexander Kraskov,et al.  Published under the scientific responsability of the EUROPEAN PHYSICAL SOCIETY Incorporating , 2002 .

[17]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[18]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[19]  Qinghua Hu,et al.  Neighborhood classifiers , 2008, Expert Syst. Appl..

[20]  Johan A. K. Suykens,et al.  Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction , 2004, Bioinform..

[21]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[22]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[23]  George W. Irwin,et al.  MISEP Method for Postnonlinear Blind Source Separation , 2007, Neural Computation.

[24]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[25]  Antti Honkela,et al.  Compact Modeling of Data Using Independent Variable Group Analysis , 2007, IEEE Transactions on Neural Networks.

[26]  Bani K. Mallick,et al.  Gene selection using a two-level hierarchical Bayesian model , 2004, Bioinform..

[27]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[28]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[29]  De-Shuang Huang,et al.  Independent component analysis-based penalized discriminant method for tumor classification using gene expression data , 2006, Bioinform..

[30]  W. Bastiaan Kleijn,et al.  Gaussian mixture model based mutual information estimation between frequency bands in speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  Xing-Ming Zhao,et al.  Analysis of Gene Expression Data Using Rpem Algorithm in Normal Mixture Model with Dynamic Adjustment of Learning Rate , 2010, Int. J. Pattern Recognit. Artif. Intell..

[32]  Lei Zhang,et al.  Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection , 2009, IEEE Transactions on Information Technology in Biomedicine.

[33]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[34]  Harri Valpola,et al.  Independent Variable Group Analysis , 2001, ICANN.

[35]  Loris Nanni,et al.  Advanced machine learning techniques for microarray spot quality classification , 2010, Neural Computing and Applications.

[36]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[37]  M. Studený,et al.  The Multiinformation Function as a Tool for Measuring Stochastic Dependence , 1998, Learning in Graphical Models.