An optimised gene selection approach using wavelet power spectrum

Data mining is a boon to many fields like bioinformatics for processing a vast amount of data. In our previous paper, we proposed a novel feature selection method for microarray data classification using Wavelet Power Spectrum (WPS). In this paper, we present optimisation techniques to improve the quality of the features thus selected and to select 'tight genes' from various cancer microarrays. The results show that 'tight genes' thus selected were more qualitative and could be used for a wide variety of data sets. Also, 'tight genes' thus selected in this mining process could be used with any existing classification approach.

[1]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[4]  Gilbert Strang,et al.  Wavelets and Dilation Equations: A Brief Introduction , 1989, SIAM Rev..

[5]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[6]  A. Aldroubi,et al.  Wavelets in Medicine and Biology , 1997 .

[7]  Stphane Mallat,et al.  A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way , 2008 .

[8]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[9]  R. Kohn,et al.  Nonparametric regression using Bayesian variable selection , 1996 .

[10]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[11]  Philip M. Long,et al.  Optimal gene expression analysis by microarrays. , 2002, Cancer cell.

[12]  Robert Kohn,et al.  Bayesian Variable Selection and Model Averaging in High-Dimensional Multinomial Nonparametric Regression , 2003 .

[13]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[14]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[15]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[16]  Shekhar Verma,et al.  Feature selection using Haar wavelet power spectrum , 2006, BMC Bioinformatics.

[17]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[18]  Michael L. Bittner,et al.  Strong Feature Sets from Small Samples , 2002, J. Comput. Biol..

[19]  Xiaodong Wang,et al.  Binarization of microarray data on the basis of a mixture model. , 2003, Molecular cancer therapeutics.

[20]  E. Dougherty,et al.  NONLINEAR PROBIT GENE CLASSIFICATION USING MUTUAL INFORMATION AND WAVELET-BASED FEATURE SELECTION , 2004 .

[21]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[22]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[23]  David W. Aha,et al.  A Comparative Evaluation of Sequential Feature Selection Algorithms , 1995, AISTATS.

[24]  Charles K. Chui,et al.  An Introduction to Wavelets , 1992 .

[25]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[26]  Justin Doak,et al.  CSE-92-18 - An Evaluation of Feature Selection Methodsand Their Application to Computer Security , 1992 .