Combining singular value decomposition and t-test into hybrid approach for significant gene extraction from microarray data

Significant gene extraction from microarray data is a challenging problem which is of great interest to researchers in Computational Biology, Medicine, Computer Science and Statistics. A number of methods have been proposed for extracting the smallest number of genes which can accurately classify different samples. Most of these methods ignore the fact that microarray data is mostly noisy. For instance, only using a statistical t-test has been shown to be insufficient since it result in a high false discovery rate. Recently, a singular value decomposition (SVD) based approach was proposed for time series microarray data reduction, however it turned out not to be efficient for classifying microarray data. To overcome the shortcomings of these approaches, this paper proposes two methods to reduce false discovery rates. The first method involves an iterative t-test which finds the p-value for each gene under perturbation by eliminating one sample at a time. It eliminates weak noisy genes by dropping any gene which does not show significant p-value under all the conditions. The second method is a hybrid process which adapts a combination of the SVD and the t-test. It considers the entropy of all the data, and thus takes the correlation between genes into account. Classification accuracy is used to validate the significance of the extracted genes. The reported test results on two datasets demonstrate the applicability and effectiveness of the two proposed methods.

[1]  Francisco-Javier Lopez,et al.  Fuzzy association rules for biological data analysis: A case study on yeast , 2008, BMC Bioinformatics.

[2]  Eivind Hovig,et al.  Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data , 2003, BMC Bioinformatics.

[3]  Xuegong Zhang,et al.  ALL/AML Cancer Classification by Gene Expression Data Using SVM and CSVM Approach , 2000 .

[4]  M. Basu,et al.  Application of neural network to gene expression data for cancer classification , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[5]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[6]  Silvio Bicciato,et al.  Pattern identification and classification in gene expression data using an autoassociative neural network model. , 2003, Biotechnology and bioengineering.

[7]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[8]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[9]  Jinyan Li,et al.  Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns , 2002, Bioinform..

[10]  Mohammed Al-Shalalfa,et al.  Application of Double Clustering to Gene Expression Data for Class Prediction , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).

[11]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[12]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  J J Chen,et al.  Selection of differentially expressed genes in microarray data analysis , 2007, The Pharmacogenomics Journal.

[14]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[15]  Michal Linial,et al.  Unsupervised feature selection under perturbations: meeting the challenges of biological data , 2007, Bioinform..

[16]  Chun-Houh Chen,et al.  Gene selection with multiple ordering criteria , 2007, BMC Bioinformatics.

[17]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Michal Linial,et al.  Novel Unsupervised Feature Filtering of Biological Data , 2006, ISMB.

[19]  Alfred O. Hero Gene selection and ranking with microarray data , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[20]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[21]  Jinyan Li,et al.  Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. , 2002 .

[22]  Mohammed Al-Shalalfa,et al.  Fuzzy Classifier Based Feature Reduction for Better Gene Selection , 2007, DaWaK.

[23]  Belitskaya‐Levy Ilana A generalized clustering problem, with application to DNA microarrays. , 2006 .

[24]  Reda Alhajj,et al.  Support Vector Machine Approach for Fast Classification , 2006, DaWaK.

[25]  Michael Q. Zhang,et al.  Identifying cooperativity among transcription factors controlling the cell cycle in yeast. , 2003, Nucleic acids research.

[26]  Jack Y. Yang,et al.  A comparative study of different machine learning methods on microarray gene expression data , 2008, BMC Genomics.