Analyzing tumor gene expression profiles

A brief introduction to high throughput technologies for measuring and analyzing gene expression is given. Various supervised and unsupervised data mining methods for analyzing the produced high-dimensional data are discussed. The main emphasis is on supervised machine learning methods for classification and prediction of tumor gene expression profiles. Furthermore, methods to rank the genes according to their importance for the classification are explored. The approaches are illustrated by exploratory studies using two examples of retrospective clinical data from routine tests; diagnostic prediction of small round blue cell tumors (SRBCT) of childhood and determining the estrogen receptor (ER) status of sporadic breast cancer. The classification performance is gauged using blind tests. These studies demonstrate the feasibility of machine learning-based molecular cancer classification.

[1]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[2]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[3]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[4]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[5]  P. Lisboa,et al.  Sensitivity methods for variable selection using the MLP , 1996, Proceedings of International Workshop on Neural Networks for Identification, Control, Robotics and Signal/Image Processing.

[6]  J. Welsh,et al.  Molecular classification of human carcinomas by use of gene expression signatures. , 2001, Cancer research.

[7]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[8]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[9]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[10]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[11]  Nir Friedman,et al.  Class discovery in gene expression data , 2001, RECOMB.

[12]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[13]  Carsten O. Peterson,et al.  Estrogen receptor status in breast cancer is associated with remarkably distinct gene expression patterns. , 2001, Cancer research.

[14]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[15]  Michael L. Bittner,et al.  Ratio statistics of gene expression levels and applications to microarray data analysis , 2002, Bioinform..

[16]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[17]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  M K Kerr,et al.  Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  C. Jacq,et al.  Transcriptomes, transcription activators and microarrays , 2001, FEBS letters.

[20]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[21]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[22]  M. Ringnér,et al.  Molecular classification of familial non-BRCA1/BRCA2 breast cancer , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Sven Bilke Shuffling Yeast Gene Expression Data , 2000 .

[24]  M. Bittner,et al.  Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays. , 1998, Cancer research.

[25]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[26]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[28]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.