Algorithm for gene selection from DNA-microarray data for disease classification

In recent years, microarray technology has advanced to such a sophistication that, it is possible to obtain gene-expression level of several thousand genes in a single experiment. Simultaneous measurements of tens of thousands of mRNAs can be performed, in which gene expressions of two samples are compared. Depending on the source of the two compared samples, important investigations, like disease progress, diagnosis, drug response, etc., can be done by analyzing DNA microarray data. When one sample source is a healthy cell, and the other a cancerous one, it is possible to identify changes in particular gene expression with the progress of the disease. The aim of this work is to identify a few number of genes, which as a set of features, could clearly classify the target disease. The problem is defined as an optimization problem, where the target is to find minimum number of genes whose expression data could classify the disease type with minimum classification error. As we view the genes as features, the whole microarray data is of enormously high dimensional, where expression value of most of the genes are irrelevant to the targeted investigation. Moreover, the number of samples are in tens to a maximum of around hundred. Under such situation, identifying and eliminating irrelevant genes is of utmost importance. In this paper, we present a two stage reduction. In Stage 1, the number of genes are reduced from thousands to around hundred. We propose a new algorithm for Stage 1 reduction phase. In Stage 2, the number of selected genes are only a few. We proposed a way to achieve that optimization, without actual experiments.

[1]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[2]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[3]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[4]  L. Hood,et al.  Monitoring gene expression profile changes in ovarian carcinomas using cDNA microarray. , 1999, Gene.

[5]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[6]  Carlo Di Bello,et al.  Analysis of an associative memory neural network for pattern identification in gene expression data , 2001, BIOKDD.

[7]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[8]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[9]  Diane Gershon,et al.  Microarray technology: An array of opportunities , 2002, Nature.

[10]  Joaquín Dopazo,et al.  Improved Class Prediction in DNA Microarray Gene Expression Data by Unsupervised Reduction of the Dimensionality followed by Supervised Learning with a Perceptron , 2003, J. VLSI Signal Process..

[11]  Qin Zhen,et al.  Neural networks for gene expression analysis and gene selection from DNA microarray , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[12]  P. Törönen,et al.  Analysis of gene expression data using self‐organizing maps , 1999, FEBS letters.

[13]  Jeff Heaton,et al.  Introduction to neural networks for C , 2008 .

[14]  James C. Bezdek,et al.  A note on self-organizing semantic maps , 1995, IEEE Trans. Neural Networks.

[15]  Ed Keedwell,et al.  Single-layer artificial neural networks for gene expression analysis , 2004, Neurocomputing.

[16]  Kevin N. Gurney,et al.  An introduction to neural networks , 2018 .

[17]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[19]  Francisco Azuaje,et al.  A computational neural approach to support the discovery of gene function and classes of cancer , 2001, IEEE Transactions on Biomedical Engineering.

[20]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[21]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.