Supervised classification of combined copy number and gene expression data

In this paper we apply a predictive profiling method to genome copy number aberrations (CNA) in combination with gene expression and clinical data to identify molecular patterns of cancer pathophysiology. Predictive models and optimal feature lists for the platforms are developed by a complete validation SVM-based machine learning system. Ranked list of genome CNA sites (assessed by comparative genomic hybridization arrays – aCGH) and of differentially expressed genes (assessed by microarray profiling with Affy HG-U133A chips) are computed and combined on a breast cancer dataset for the discrimination of Luminal/ER+ (Lum/ER+) and Basal-like/ERclasses. Different encodings are developed and applied to the CNA data, and predictive variable selection is discussed. We analyze the combination of profiling information between the platforms, also considering the pathophysiological data. A specific subset of patients is identified that has a different response to classification by chromosomal gains and losses and by differentially expressed genes, corroborating the idea that genomic CNA can represent an independent source for tumor classification.

[1]  Cesare Furlanello,et al.  Integrating gene expression profiling and clinical data , 2008, Int. J. Approx. Reason..

[2]  Barbara J. Trask,et al.  Array Comparative Genomic Hybridization Analysis of Genomic Alterations in Breast Cancer Subtypes , 2004, Cancer Research.

[3]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[4]  S. Merler,et al.  Semisupervised learning for molecular profiling , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Ajay N. Jain,et al.  Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. , 2006, Cancer cell.

[6]  Ajay N. Jain,et al.  Breast tumor copy number aberration phenotypes and genomic instability , 2006, BMC Cancer.

[7]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[8]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[10]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Cesare Furlanello,et al.  Entropy-based gene ranking without selection bias for the predictive classification of microarray data , 2003, BMC Bioinformatics.

[12]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[13]  Joel Greshock,et al.  High resolution genomic analysis of sporadic breast cancer using array-based comparative genomic hybridization , 2005, Breast Cancer Research.

[14]  Amy V Kapp,et al.  Discovery and validation of breast cancer subtypes , 2006, BMC Genomics.

[15]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[17]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[18]  S. Merler,et al.  A Grid Environment for High-Throughput Proteomics , 2007, IEEE Transactions on NanoBioscience.

[19]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Cesare Furlanello,et al.  Proteome Profiling without Selection Bias , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[21]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22]  BMC Bioinformatics , 2005 .

[23]  Cesare Furlanello,et al.  Combining feature selection and DTW for time-varying functional genomics , 2006, IEEE Transactions on Signal Processing.