Prediction of biologically significant components from microarray data: Independently Consistent Expression Discriminator (ICED)

MOTIVATION Class distinction is a supervised learning approach that has been successfully employed in the analysis of high-throughput gene expression data. Identification of a set of genes that predicts differential biological states allows for the development of basic and clinical scientific approaches to the diagnosis of disease. The Independent Consistent Expression Discriminator (ICED) was designed to provide a more biologically relevant search criterion during predictor selection by embracing the inherent variability of gene expression in any biological state. The four components of ICED include (i) normalization of raw data; (ii) assignment of weights to genes from both classes; (iii) counting of votes to determine optimal number of predictor genes for class distinction; (iv) calculation of prediction strengths for classification results. The search criteria employed by ICED is designed to identify not only genes that are consistently expressed at one level in one class and at a consistently different level in another class but identify genes that are variable in one class and consistent in another. The result is a novel approach to accurately select biologically relevant predictors of differential disease states from a small number of microarray samples. RESULTS The data described herein utilized ICED to analyze the large AML/ALL training and test data set (Golub et al., 1999, Science, 286, 531-537) in addition to a smaller data set consisting of an animal model of the childhood neurodegenerative disorder, Batten disease, generated for this study. Both of the analyses presented herein have correctly predicted biologically relevant perturbations that can be used for disease classification, irrespective of sample size. Furthermore, the results have provided candidate proteins for future study in understanding the disease process and the identification of potential targets for therapeutic intervention.

[1]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[2]  T. Ueno,et al.  Specific storage of subunit c of mitochondrial ATP synthase in lysosomes of neuronal ceroid lipofuscinosis (Batten's disease). , 1992, Journal of biochemistry.

[3]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  J. Ezaki,et al.  Specific Delay in the Degradation of Mitochondrial ATP Synthase Subunit c in Late Infantile Neuronal Ceroid Lipofuscinosis Is Derived from Cellular Proteolytic Dysfunction Rather than Structural Alteration of Subunit c , 1996, Journal of neurochemistry.

[5]  M. Štabuc-Šilih,et al.  Improved prediction of decreased creatinine clearance by serum cystatin C: use in cancer patients before and during chemotherapy. , 2000, Clinical chemistry.

[6]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[7]  Andrea Califano,et al.  Analysis of Gene Expression Microarrays for Phenotype Classification , 2000, ISMB.

[8]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[9]  S H Kim,et al.  Exploiting chemical libraries, structure, and genomics in the search for kinase inhibitors. , 1998, Science.

[10]  B. Kirschbaum,et al.  Comparative gene-expression analysis. , 1999, Trends in biotechnology.

[11]  Byoung-Tak Zhang,et al.  Applying Machine Learning Techniques to Analysis of Gene Expression Data: Cancer Diagnosis , 2002 .

[12]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[13]  C P Price,et al.  Serum cystatin C in patients with myeloma. , 2001, Clinica chimica acta; international journal of clinical chemistry.

[14]  T. Darden,et al.  Computational Analysis of Leukemia Microarray Expression Data Using the GA/KNN Method , 2002 .

[15]  Marcel Leist,et al.  Cathepsin B Acts as a Dominant Execution Protease in Tumor Cell Apoptosis Induced by Tumor Necrosis Factor , 2001, The Journal of cell biology.

[16]  Masumi Ito,et al.  An autoantibody inhibitory to glutamic acid decarboxylase in the neurodegenerative disorder Batten disease. , 2002, Human molecular genetics.

[17]  J. Walker,et al.  Mitochondrial ATP synthase subunit c storage in the ceroid-lipofuscinoses (Batten disease). , 1992, American journal of medical genetics.

[18]  P. Brown,et al.  Drug target validation and identification of secondary drug target effects using DNA microarrays , 1998, Nature Medicine.

[19]  Jason H. Moore,et al.  Evolutionary Computation in Microarray Data Analysis , 2002 .

[20]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  D N Palmer,et al.  Batten disease and the ATP synthase subunit c turnover pathway: raising antibodies to subunit c. , 1995, American journal of medical genetics.

[22]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[23]  Gregory R. Grant,et al.  USING NON-PARAMETRIC METHODS IN THE CONTEXT OF MULTIPLE TESTING TO DETERMINE DIFFERENTIALLY EXPRESSED GENES , 2002 .

[24]  Jonathan D. Cooper,et al.  Targeted Disruption of the Cln3 Gene Provides a Mouse Model for Batten Disease , 1999, Neurobiology of Disease.

[25]  I J Christensen,et al.  Cysteine proteinase inhibitors stefin A, stefin B, and cystatin C in sera from patients with colorectal cancer: relation to prognosis. , 2000, Clinical cancer research : an official journal of the American Association for Cancer Research.

[26]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Z. Naito,et al.  Expression of Cathepsin B and Cystatin C in Human Breast Cancer , 2001, Surgery Today.