Improving Gene Selection in Microarray Data Analysis Using Fuzzy Patterns Inside a CBR System

In recent years, machine learning and data mining fields have found a successful application area in the field of DNA microarray technology. Gene expression profiles are composed of thousands of genes at the same time, representing complex relationships between them. One of the well-known constraints specifically related to microarray data is the large number of genes in comparison with the small number of available experiments or cases. In this context, the ability to identify an accurate gene selection strategy is crucial to reduce the generalization error (false positives) of state-of-the-art classification algorithms. This paper presents a reduction algorithm based on the notion of fuzzy gene expression, where similar (co-expressed) genes belonging to different patients are selected in order to construct a supervised prototype-based retrieval model. This technique is employed to implement the retrieval step in our new gene-CBR system. The proposed method is illustrated with the analysis of microarray data belonging to bone marrow cases from 43 adult patients with cancer plus a group of three cases corresponding to healthy persons.

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[3]  Blaise Hanczar,et al.  Improving classification of microarray data using prototype-based feature selection , 2003, SKDD.

[4]  Klaus Obermayer,et al.  Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers , 2002, NIPS.

[5]  Igor Jurisica,et al.  Applications of Case-Based Reasoning in Molecular Biology , 2004, AI Mag..

[6]  Juan M. Corchado,et al.  Maximum Likelihood Hebbian Learning Based Retrieval Method for CBR Systems , 2003, ICCBR.

[7]  Gaolin Zheng,et al.  Neural Network Classifiers and Gene Selection Methods for Microarray Data on Human Lung Adenocarcinoma , 2003 .

[8]  Simon C. K. Shiu,et al.  Foundations of Soft Case-Based Reasoning: Pal/Soft Case-Based Reasoning , 2004 .

[9]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Werner Dubitzky,et al.  A Practical Approach to Microarray Data Analysis , 2003, Springer US.

[11]  Christopher K. Riesbeck,et al.  Inside Case-Based Reasoning , 1989 .

[12]  Juan M. Corchado,et al.  FSfRT: FORECASTING SYSTEM FOR RED TIDES. A HYBRID AUTONOMOUS AI MODEL , 2003, Appl. Artif. Intell..

[13]  Walter L. Ruzzo,et al.  Improved Gene Selection for Classification of Microarrays , 2002, Pacific Symposium on Biocomputing.

[14]  J. Tou FEATURE SELECTION FOR PATTERN RECOGNITION SYSTEMS , 1969 .

[15]  L. K. Buehler,et al.  Normalizing DNA microarray data. , 2002, Current issues in molecular biology.

[16]  Sankar K. Pal,et al.  Case generation using rough sets with fuzzy representation , 2004, IEEE Transactions on Knowledge and Data Engineering.

[17]  Barry Smyth,et al.  Advances in Case-Based Reasoning , 1996, Lecture Notes in Computer Science.

[18]  Roger A. Smith,et al.  Combinatorial Chemistry and High‐throughput Screening , 2006 .

[19]  Marimuthu Palaniswami,et al.  Machine learning in low-level microarray analysis , 2003, SKDD.

[20]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[21]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[22]  S. Pal,et al.  Foundations of Soft Case-Based Reasoning: Pal/Soft Case-Based Reasoning , 2004 .

[23]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[24]  G. Christian Overton,et al.  Knowledge Discovery in GENBANK , 1993, ISMB.

[25]  Gregory Piatetsky-Shapiro,et al.  Microarray data mining: facing the challenges , 2003, SKDD.

[26]  Juan M. Corchado,et al.  Quantifying the Ocean's CO2 Budget with a CoHeL-IBR System , 2004, ECCBR.

[27]  J. Miguel,et al.  Gene expression profile reveals deregulation of genes with relevant functions in the different subclasses of acute myeloid leukemia , 2005, Leukemia.

[28]  Juan M. Corchado,et al.  Employing TSK Fuzzy Models to Automate the Revision Stage of a CBR System , 2003, CAEPIA.

[29]  E. Wolski,et al.  Normalization strategies for cDNA microarrays. , 2000, Nucleic acids research.

[30]  A. Levine,et al.  Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. , 2001, Combinatorial chemistry & high throughput screening.

[31]  ROSA BLANCO,et al.  Gene Selection For Cancer Classification Using Wrapper Approaches , 2004, Int. J. Pattern Recognit. Artif. Intell..

[32]  Sankar K. Pal,et al.  Soft Computing in Case Based Reasoning , 2000, Springer London.

[33]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[34]  Sung-Bae Cho,et al.  Machine Learning in DNA Microarray Analysis for Cancer Classification , 2003, APBC.

[35]  Juan M. Corchado,et al.  FSfRT: Forecasting System for Red Tides , 2004, Applied Intelligence.