A combination of kernel methods and genetic programming for gene expression pattern classification

The rapidly emerging field of quantitative proteomics has established itself as a credible approach for understanding of the biology of whole organisms. Classification of proteins according to the level of their expression during a particular process allows discovering causal relationships among genes and proteins involved in the process. In this paper, we would like to propose a new algorithm for pattern classification, allowing for extraction of user defined patterns from a database of kinetic gene expression profiles. This algorithm is a combination of kernel methods and genetic programming. The algorithm was tested on publicly available transcriptomic and proteomic time series datasets and the results showed that the algorithm could find all similar patterns in the database with very low misclassification rate.

[1]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[2]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Jiri Vohradsky,et al.  Proteome of Caulobacter crescentus cell cycle publicly accessible on SWICZ server , 2003, Proteomics.

[4]  J. Vohradský,et al.  Proteomic analysis of the bacterial cell cycle , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Alex A. Freitas,et al.  Data Mining with Constrained-syntax Genetic Programming: Applications in Medical Data Sets , 2001 .

[6]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[8]  Jano I. van Hemert,et al.  A Comparison of Genetic Programming Variants for Data Classification , 1999, IDA.

[9]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Richard M. Karp,et al.  CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts , 2001, ISMB.

[11]  Venu Govindaraju,et al.  Issues in evolving GP based classifiers for a pattern recognition task , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[12]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[13]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Jiří Vohradský,et al.  Adaptive classification of two‐dimensional gel electrophoretic spot patterns by neural networks and cluster analysis , 1997, Electrophoresis.

[15]  Rama Chellappa,et al.  An experimental evaluation of linear and kernel-based methods for face recognition , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[16]  Jiri Vohradsky,et al.  Classification of proteomic kinetic patterns using supervised genetic programming , 2005, 2005 IEEE Congress on Evolutionary Computation.

[17]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[18]  Hitoshi Iba,et al.  Classification of Gene Expression Profile Using Combinatory Method of Evolutionary Computation and Machine Learning , 2004, Genetic Programming and Evolvable Machines.

[19]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..