Classification of Scleroderma and Normal Biopsy Data and Identification of Possible Biomarkers of the Disease

Scleroderma is an autoimmune disease of the connective tissues, which thickens and hardens the affected areas. Recently, researchers have found evidence that genes are important factors for this disease, and there exist consistent differences in the patterns of gene expressions of skin biopsies from affected and non-affected individuals. In this paper, we apply genetic programming (GP) on the gene expression data of scleroderma and normal biopsies to evolve the classification rules that can differentiate between them. In these evolved rules, we have found six genes that have differential gene expression levels in scleroderma and normal biopsies and thus individually can classify all the samples correctly. In addition to these genes, we have also found some simple rules containing two or more genes that can classify all the samples perfectly

[1]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[3]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Hitoshi Iba,et al.  Classification of Gene Expression Data by Majority Voting Genetic Programming Classifier , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[5]  A. Rizvanov,et al.  Andes virus stimulates interferon‐inducible MxA protein expression in endothelial cells , 2005, Journal of medical virology.

[6]  H. Iba,et al.  Gene selection for classification of cancers using probabilistic model building genetic algorithm. , 2005, Bio Systems.

[7]  William Perrizo,et al.  Comprehensive vertical sample-based KNN/LSVM classification for gene expression analysis , 2004, J. Biomed. Informatics.

[8]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9]  K. Deb,et al.  Reliable classification of two-class cancer data using evolutionary algorithms. , 2003, Bio Systems.

[10]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[11]  Jason H. Moore,et al.  Symbolic discriminant analysis of microarray data in autoimmune disease , 2002, Genetic epidemiology.

[12]  Hitoshi Iba,et al.  Selection of the most useful subset of genes for gene expression-based classification , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[13]  William B. Langdon,et al.  Genetic Programming for Mining DNA Chip Data from Cancer Patients , 2004, Genetic Programming and Evolvable Machines.

[14]  Yong-Hyuk Kim,et al.  A Genetic Approach for Gene Selection on Microarray Expression Data , 2004, GECCO.

[15]  Jem J. Rowland,et al.  Generalisation and Model Selection in Supervised Learning with Evolutionary Computation , 2003, EvoWorkshops.

[16]  Robert J. Lefkowitz,et al.  G Protein-coupled Receptors , 1998, The Journal of Biological Chemistry.

[17]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[18]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[19]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[20]  Sung-Bae Cho,et al.  Lymphoma Cancer Classification Using Genetic Programming with SNR Features , 2004, EuroGP.

[21]  Andrzej Galat,et al.  Molecular cloning and overexpression of the human FK506-binding protein FKBP , 1990, Nature.

[22]  Xuefeng Bruce Ling,et al.  Multiclass cancer classification and biomarker discovery using GA-based algorithms , 2005, Bioinform..

[23]  Hitoshi Iba,et al.  Extraction of informative genes from microarray data , 2005, GECCO '05.

[24]  Hitoshi Iba,et al.  Identification of Informative Genes for Molecular Classification Using Probabilistic Model Building Genetic Algorithm , 2004, GECCO.

[25]  David Botstein,et al.  Systemic and cell type-specific gene expression patterns in scleroderma skin , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Stefano Iacobelli,et al.  90K (Mac-2 BP) and galectins in tumor progression and metastasis , 2004, Glycoconjugate Journal.

[27]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[28]  Paul Terry,et al.  Application of the GA/KNN method to SELDI proteomics data , 2004, Bioinform..

[29]  Joseph A. Driscoll,et al.  Classification of Gene Expression Data with Genetic Programming , 2003 .

[30]  Hugh Gordon,et al.  Scleroderma , 1937 .

[31]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[32]  Hitoshi Iba,et al.  Classification of Gene Expression Profile Using Combinatory Method of Evolutionary Computation and Machine Learning , 2004, Genetic Programming and Evolvable Machines.

[33]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[34]  J. M. Deutsch,et al.  Evolutionary algorithms for finding optimal gene sets in microarray prediction , 2003, Bioinform..

[35]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[37]  Bohn Stafleu van Loghum,et al.  Online … , 2002, LOG IN.