A New Gene Selection Technique Using Feature Selection Methodology

The DNA Microarray technology can measure the expression levels of thousands of genes simultaneously, and produces huge volumes of gene expression data. Such gene data include complex variations among expression levels of genes in the various classes of samples, which allows for classifying and clustering the samples based on only a small subset of genes. We aim to identify those genes that demonstrate high capabilities of discrimination between the classes of samples (e.g. the normal vs disease tissue samples). We present a new technique for gene selection and extraction using various feature selection techniques. Our method is based on computing thresholds and discriminating capabilities of each gene, and classifying the data according to only those genes that have highest discriminating capabilities. The method extracts very small subsets of informative genes that can improve the classification accuracy. We applied the method on four different common gene expression datasets used mainly for this purpose. The method produces encouraging and competitive results of classification performance compared with recent similar techniques.

[1]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[2]  François Poulet,et al.  Incremental SVM and Visualization Tools for Bio- medical Data Mining , 2003 .

[3]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[4]  J. Newton,et al.  Analysis of Microarray Gene Expression Data Using Machine Learning Techniques , 2002 .

[5]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[6]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[7]  George Karypis,et al.  Gene Classification Using Expression Profiles: A Feasibility Study , 2005, Int. J. Artif. Intell. Tools.

[8]  D. Chaussabel,et al.  Mining microarray expression data by literature profiling , 2002, Genome Biology.

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  Li Shen,et al.  Dimension reduction-based penalized logistic regression for cancer classification using microarray data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[12]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[13]  Danh V. Nguyen,et al.  Multi-class cancer classification via partial least squares with gene expression profiles , 2002, Bioinform..

[14]  Hitoshi Iba,et al.  Extraction of informative genes from microarray data , 2005, GECCO '05.

[15]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[16]  Tianzi Jiang,et al.  A combinational feature selection and ensemble neural network method for classification of gene expression data , 2004, BMC Bioinformatics.