Finding regions of aberrant DNA copy number associated with tumor phenotype

DNA copy number alterations are a hallmark of cancer. Understanding their role in tumor progression can help improve diagnosis, prognosis and therapy selection for cancer patients and can contribute to the development of personalised therapies. High-resolution, genome-wide measurements of DNA copy number changes for large cohorts of tumors are currently available, owing to the rapid development of technologies like microarray-based array comparative hibridization (arrayCGH). In this manuscript, we introduce a computational pipeline for statistical analysis of tumor cohorts, which can help extract relevant patterns of copy number aberrations and infer their association with various phenotypical indicators. The pipeline makes use of machine learning techniques for classification and feature selection, with emphasis on interpretable models (linear models with penalties, tree-based models). The main challenges that our methods meet are the high dimensionality of the arrays compared to the small number of tumor samples available, as well as the large correlations between copy number estimates measured at neighboring genomic locations. Consequently, feature selection is unstable, depending strongly on the set of training samples, leading to un-reproducible signatures across different clinical studies. We also show that the feature ranking given by several widely-used methods for feature selection is biased due to the large correlations between features. In order to correct for the bias and instability of the feature ranking, we introduce a dimension reduction step in our pipeline, consisting of multivariate segmentation of the set of arrays. We present three algorithms for multivariate segmentation, which are based on indentifying recurrent DNA breakpoints or DNA regions of constant copy number profile. The multivariate segmentation constitutes the basis for computing a smaller set of super-features, by summarizing the DNA copy number within the segmentation regions. Using the super-features for supervised classification, we improve the interpretability and stability of the models, where the baseline for comparison consists of classification models trained on probe data. We validated the methods by training models for prediction of the phenotype of breast cancers and neuroblastoma tumors. We show that the multivariate segmentation step affords higher model stability and it does not decrease the accuracy of the prediction. We obtain substantial dimension reduction (up to 200-fold less predictors), which recommends the multivariate segmentation procedures not only for the purpose of phenotype prediction, but also as preprocessing step for downstream integration with other data types. The interpretability of the models is also improved, revealing important associations between copy number aberrations and phenotype. For example, we show that a very informative predictor that distinguishes between inflammatory and non-inflammatory breast cancers with ERBB2 amplification is the co-amplification of the genomic region located in the immediate vicinity of the ERBB2 gene locus. Therefore, we conclude that the size of the amplicon is associated with the cancer subtype, a hypothesis present elsewhere in the literature. In the case of neuroblastoma tumors, we show that patients belonging to different age subgroups are characterized by distinct copy number patterns, especially when the subgroups are defined as older or younger than 16-18 months. Indeed, considering a large set of age cutoffs, our prediction models are most accurate if the cutoff is around 16-18 months. We thereby confirm the recommendation for a higher age cutoff than 12 months

[1]  Stine H. Kresse,et al.  DNA Copy Number Changes in Human Malignant Fibrous Histiocytomas by Array Comparative Genomic Hybridisation , 2010, PloS one.

[2]  J. Herman,et al.  Promoter hypermethylation and BRCA1 inactivation in sporadic breast and ovarian tumors. , 2000, Journal of the National Cancer Institute.

[3]  Jian Huang,et al.  BMC Bioinformatics BioMed Central Methodology article Supervised group Lasso with applications to microarray data , 2007 .

[4]  P. Nowell,et al.  Discovery of the Philadelphia chromosome: a personal perspective. , 2007, The Journal of clinical investigation.

[5]  Emmanuel Barillot,et al.  Classification of arrayCGH data using fused SVM , 2008, ISMB.

[6]  M. Wold Replication protein A: a heterotrimeric, single-stranded DNA-binding protein required for eukaryotic DNA metabolism. , 1997, Annual review of biochemistry.

[7]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[8]  M. Esteller Cancer epigenomics: DNA methylomes and histone-modification maps , 2007, Nature Reviews Genetics.

[9]  S. Shurtleff,et al.  Near-triploidy and near-tetraploidy in childhood acute lymphoblastic leukemia: association with B-lineage blast cells carrying the ETV6-RUNX1 fusion, T-lineage immunophenotype, and favorable outcome. , 2006, Cancer genetics and cytogenetics.

[10]  Michael Krawczak,et al.  Translocation and gross deletion breakpoints in human inherited disease and cancer I: Nucleotide composition and recombination‐associated motifs , 2003, Human mutation.

[11]  Benjamin J. Raphael,et al.  Detection of recurrent rearrangement breakpoints from copy number data , 2011, BMC Bioinformatics.

[12]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[13]  P. Vertino,et al.  Age/race differences in HER2 testing and in incidence rates for breast cancer triple subtypes , 2010, Cancer.

[14]  J. Minna,et al.  Amplification and expression of the c-myc oncogene in human lung cancer cell lines , 1983, Nature.

[15]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[16]  P. Marks,et al.  Histone deacetylase inhibitor selectively induces p21WAF1 expression and gene-associated histone acetylation. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[18]  B. Mcclintock,et al.  The Stability of Broken Ends of Chromosomes in Zea Mays. , 1941, Genetics.

[19]  Emmanuel Barillot,et al.  Spatial normalization of array-CGH data , 2006, BMC Bioinformatics.

[20]  A. Feinberg,et al.  Loss of IGF2 Imprinting: A Potential Marker of Colorectal Cancer Risk , 2003, Science.

[21]  M. West,et al.  Gene expression predictors of breast cancer outcomes , 2003, The Lancet.

[22]  Li-San Wang,et al.  Polyploidy, aneuploidy and the evolution of cancer. , 2010, Advances in experimental medicine and biology.

[23]  Céline Rouveirol,et al.  Bioinformatics Original Paper Computation of Recurrent Minimal Genomic Alterations from Array-cgh Data , 2022 .

[24]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[25]  Michel Verleysen,et al.  The permutation test for feature selection by mutual information , 2006, ESANN.

[26]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[27]  Hisashi Tanaka,et al.  Palindromic gene amplification — an evolutionarily conserved role for DNA inverted repeats in the genome , 2009, Nature Reviews Cancer.

[28]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[29]  Wessel N. van Wieringen,et al.  CGHcall: Calling aberrations for array CGH tumor profiles. , 2008 .

[30]  M. Fried,et al.  Isolation of a gene enhancer within an amplified inverted duplication after "expression selection". , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[31]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[32]  D. Pellman Cell biology: Aneuploidy and cancer , 2007, Nature.

[33]  J. Herman,et al.  Inactivation of the CDKN2/p16/MTS1 gene is frequently associated with aberrant DNA methylation in all common human cancers. , 1995, Cancer research.

[34]  M. Lieber,et al.  The Mechanism of Human Nonhomologous DNA End Joining* , 2008, Journal of Biological Chemistry.

[35]  N. Mantel Evaluation of survival data and two new rank order statistics arising in its consideration. , 1966, Cancer chemotherapy reports.

[36]  Kathleen R. Cho,et al.  Mutations in PTEN are frequent in endometrial carcinoma but rare in other common gynecological malignancies. , 1997, Cancer research.

[37]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[38]  E. Birney,et al.  Challenges and standards in integrating surveys of structural variation , 2007, Nature Genetics.

[39]  J. Herman,et al.  Point mutation and homozygous deletion of PTEN/MMAC1 in primary bladder cancers , 1998, Oncogene.

[40]  R. Kanaar,et al.  DNA double-strand break repair and chromosome translocations. , 2006, DNA repair.

[41]  Allan Balmain,et al.  Cancer genetics: from Boveri and Mendel to microarrays , 2001, Nature Reviews Cancer.

[42]  Chris H. Q. Ding,et al.  Stable feature selection via dense feature groups , 2008, KDD.

[43]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[44]  Derek Y. Chiang,et al.  Characterizing the cancer genome in lung adenocarcinoma , 2007, Nature.

[45]  A. Feinberg Imprinting of a genomic domain of 11p15 and loss of imprinting in cancer: an introduction. , 1999, Cancer research.

[46]  M. Williamson,et al.  p16 (CDKN2) is a major deletion target at 9p21 in bladder cancer. , 1995, Human molecular genetics.

[47]  M. Lieber,et al.  Human Chromosomal Translocations at CpG Sites and a Theoretical Basis for Their Lineage and Stage Specificity , 2008, Cell.

[48]  Thomas Lengauer,et al.  Classification with correlated features: unreliability of feature ranking and solutions , 2011, Bioinform..

[49]  A. Mills,et al.  The quest for the 1p36 tumor suppressor. , 2008, Cancer research.

[50]  Christian J Stoeckert,et al.  Assessing the Significance of Conserved Genomic Aberrations Using High Resolution Genomic Microarrays , 2007, PLoS genetics.

[51]  Thomas Lengauer,et al.  Permutation importance: a corrected feature importance measure , 2010, Bioinform..

[52]  Wen-Lin Kuo,et al.  Array-based comparative genomic hybridization for genome-wide screening of DNA copy number in bladder tumors. , 2003, Cancer research.

[53]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[54]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[55]  Oscar M. Rueda and Ramon Diaz-Uriarte Finding Recurrent Copy Number Alteration Regions: A Review of Methods , 2010 .

[56]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[57]  S. Sharan,et al.  Resolving RAD51C function in late stages of homologous recombination , 2007, Cell Division.

[58]  S. Sen,et al.  Aneuploidy and cancer , 2000, Current opinion in oncology.

[59]  W. J. Brammar,et al.  Alterations to either c-erbB-2(neu) or c-myc proto-oncogenes in breast carcinomas correlate with poor short-term prognosis. , 1987, Oncogene.

[60]  A. Feinberg,et al.  The history of cancer epigenetics , 2004, Nature Reviews Cancer.

[61]  Jorma Isola,et al.  Patterns of chromosomal imbalances defines subgroups of breast cancer with distinct clinical features and prognosis. A study of 305 tumors by comparative genomic hybridization. , 2003, Cancer research.

[62]  Emmanuel Barillot,et al.  BAC array CGH distinguishes mutually exclusive alterations that define clinicogenetic subtypes of gliomas , 2007, International journal of cancer.

[63]  Robert Tibshirani,et al.  Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene‐expression subtypes of breast cancer , 2006, Genes, chromosomes & cancer.

[64]  Deevya L. Narayanan,et al.  Review: Ultraviolet radiation and skin cancer , 2010, International journal of dermatology.

[65]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[67]  J. Gudmundsson,et al.  Mapping loss of heterozygosity at chromosome 13q: loss at 13q12-q13 is associated with breast tumour progression and poor prognosis. , 1998, European journal of cancer.

[68]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[69]  C. Sander,et al.  Functional Copy-Number Alterations in Cancer , 2008, PloS one.

[70]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[71]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[72]  Peter Bühlmann,et al.  Finding predictive gene groups from microarray data , 2004 .

[73]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[74]  Peter A. Jones,et al.  The Epigenomics of Cancer , 2007, Cell.

[75]  J. Herman,et al.  5′ CpG island methylation is associated with transcriptional silencing of the tumour suppressor p16/CDKN2/MTS1 in human cancers , 1995, Nature Medicine.

[76]  Joe W. Gray,et al.  Translating insights from the cancer genome into clinical practice , 2008, Nature.

[77]  M. Ittmann,et al.  Homozygous deletion of the PTEN tumor suppressor gene in a subset of prostate adenocarcinomas. , 1998, Clinical cancer research : an official journal of the American Association for Cancer Research.

[78]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[79]  T. Ngoma World Health Organization cancer priorities in developing countries. , 2006, Annals of oncology : official journal of the European Society for Medical Oncology.

[80]  Patricia L. Harris,et al.  Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib. , 2004, The New England journal of medicine.

[81]  Marcel J T Reinders,et al.  Imaging , Diagnosis , Prognosis Clinical Cancer Research Integration of DNA Copy Number Alterations and Prognostic Gene Expression Signatures in Breast Cancer Patients , 2010 .

[82]  R. Jaenisch,et al.  Chromosomal Instability and Tumors Promoted by DNA Hypomethylation , 2003, Science.

[83]  M. García-Closas,et al.  Differences in Risk Factors for Breast Cancer Molecular Subtypes in a Population-Based Study , 2007, Cancer Epidemiology Biomarkers & Prevention.

[84]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[85]  D. Richardson,et al.  Ionizing Radiation and Leukemia Mortality among Japanese Atomic Bomb Survivors, 1950–2000 , 2009, Radiation research.

[86]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[87]  Carsten O. Peterson,et al.  Estrogen receptor status in breast cancer is associated with remarkably distinct gene expression patterns. , 2001, Cancer research.

[88]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[89]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[90]  Erwin G. Van Meir,et al.  Exciting New Advances in Neuro‐Oncology: The Avenue to a Cure for Malignant Glioma , 2010, CA: a cancer journal for clinicians.

[91]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[92]  W. Rens,et al.  Aneuploidy in mitosis of PtK1 cells is generated by random loss and nondisjunction of individual chromosomes , 2009, Journal of Cell Science.

[93]  D. Ward,et al.  Immunological method for mapping genes on Drosophila polytene chromosomes. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[94]  Therese Sørlie,et al.  Molecular portraits of breast cancer: tumour subtypes as distinct disease entities. , 2004, European journal of cancer.

[95]  D. Parkin 2. Tobacco-attributable cancer burden in the UK in 2010 , 2011, British Journal of Cancer.

[96]  R. Holliday A mechanism for gene conversion in fungi. , 1964, Genetical research.

[97]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[98]  Christian Steidl,et al.  Genome-wide profiling of follicular lymphoma by array comparative genomic hybridization reveals prognostically significant DNA copy number imbalances. , 2009, Blood.

[99]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[100]  Laura Tolosi,et al.  Predicting drug susceptibility of non-small cell lung cancers based on genetic lesions. , 2009, The Journal of clinical investigation.

[101]  John C. H. Fei,et al.  The Distribution of Income by Factor Components , 1980 .

[102]  J. A. Hamilton,et al.  Mutation and expression analysis of the putative prostate tumour-suppressor gene PTEN. , 1998, British Journal of Cancer.

[103]  Andrew B. Nobel,et al.  DiNAMIC: a method to identify recurrent DNA copy number aberrations in tumors , 2010, Bioinform..

[104]  Yong Chen,et al.  Robust principal component analysis and outlier detection with ecological data , 2004 .

[105]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .

[106]  J Khan,et al.  International consensus for neuroblastoma molecular diagnostics: report from the International Neuroblastoma Risk Group (INRG) Biology Committee , 2009, British Journal of Cancer.

[107]  Yongdai Kim,et al.  Gradient LASSO for feature selection , 2004, ICML.

[108]  J. Butel,et al.  Simian virus 40 in human cancers. , 2003, The American journal of medicine.

[109]  Johan Staaf,et al.  Normalization of array-CGH data: influence of copy number imbalances , 2007, BMC Genomics.

[110]  Yonatan Aumann,et al.  Efficient Calculation of Interval Scores for DNA Copy Number Data Analysis , 2005, RECOMB.

[111]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[112]  Peter A. Jones,et al.  Epigenetics in cancer. , 2010, Carcinogenesis.

[113]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[114]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[115]  C. Sander,et al.  Somatic mutations of the Parkinson's disease–associated gene PARK2 in glioblastoma and other human malignancies , 2010, Nature Genetics.

[116]  P. Nowell The clonal evolution of tumor cell populations. , 1976, Science.

[117]  B. A. Pierce,et al.  Genetics: A Conceptual Approach , 2002 .

[118]  M. A. van de Wiel,et al.  Weighted clustering of called array CGH data. , 2008, Biostatistics.

[119]  Daniel Birnbaum,et al.  High-Resolution Comparative Genomic Hybridization of Inflammatory Breast Cancer and Identification of Candidate Genes , 2011, PloS one.

[120]  A. Krasnitz,et al.  Genomic Architecture Characterizes Tumor Progression Paths and Fate in Breast Cancer Patients , 2010, Science Translational Medicine.

[121]  Edoardo M. Airoldi,et al.  Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields , 2008, Bioinform..

[122]  Mark R. Segal,et al.  Machine Learning Benchmarks and Random Forest Regression , 2004 .

[123]  Koji Ueno,et al.  Aneuploidy Predicts Outcome in Patients with Endometrial Carcinoma and Is Related to Lack of CDH13 Hypermethylation , 2008, Clinical Cancer Research.

[124]  Kevin P. Murphy,et al.  Modeling recurrent DNA copy number alterations in array CGH data , 2007, ISMB/ECCB.

[125]  Christian J Stoeckert,et al.  STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. , 2006, Genome research.

[126]  K K Matthay,et al.  Evidence for an age cutoff greater than 365 days for neuroblastoma risk group stratification in the Children's Oncology Group. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[127]  Dinh-Tuan Pham,et al.  Criteria based on mutual information minimization for blind source separation in post nonlinear mixtures , 2005, Signal Process..

[128]  O. Yli-Harja,et al.  Finding common aberrations in array CGH data , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[129]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[130]  M. Srivastava,et al.  On Tests for Detecting Change in Mean , 1975 .

[131]  Daniel Birnbaum,et al.  Genome profiling of ERBB2-amplified breast cancers , 2010, BMC Cancer.

[132]  L. Chin,et al.  High-resolution genomic profiles define distinct clinico-pathogenetic subgroups of multiple myeloma patients. , 2006, Cancer cell.

[133]  P. Hahn Molecular biology of double‐minute chromosomes , 1993, BioEssays : news and reviews in molecular, cellular and developmental biology.

[134]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[135]  M. West,et al.  Gene expression phenotypic models that predict the activity of oncogenic pathways , 2003, Nature Genetics.

[136]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[137]  D. Carson,et al.  Deletions of the cyclin-dependent kinase-4 inhibitor gene in multiple human cancers , 1994, Nature.

[138]  A. Pastink,et al.  Genomic integrity and the repair of double-strand DNA breaks. , 2001, Mutation research.

[139]  Amy V Kapp,et al.  Discovery and validation of breast cancer subtypes , 2006, BMC Genomics.

[140]  Douglas Grove,et al.  Denoising array-based comparative genomic hybridization data using wavelets. , 2005, Biostatistics.

[141]  R. Tibshirani,et al.  A method for calling gains and losses in array CGH data. , 2005, Biostatistics.

[142]  Michael J. Black,et al.  Robust Principal Component Analysis for Computer Vision , 2001, ICCV.

[143]  Ding-Zhu Du,et al.  A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering , 2003, J. Glob. Optim..

[144]  X. Huang,et al.  Obesity, the PI3K/Akt signal pathway and colon cancer , 2009, Obesity reviews : an official journal of the International Association for the Study of Obesity.

[145]  J. Manson,et al.  Family history, age, and risk of breast cancer. Prospective data from the Nurses' Health Study. , 1993, JAMA.

[146]  B. Trask,et al.  Amplification of the human dihydrofolate reductase gene via double minutes is initiated by chromosome breaks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[147]  Jin-Tang Dong Chromosomal Deletions and Tumor Suppressor Genes in Prostate Cancer , 2004, Cancer and Metastasis Reviews.

[148]  DNA ploidy in neuroblastoma. , 1996, Neoplasma.

[149]  Esteban Ballestar,et al.  Methyl‐CpG binding proteins identify novel sites of epigenetic inactivation in human cancer , 2003, The EMBO journal.

[150]  A. Nicholson,et al.  Mutations of the BRAF gene in human cancer , 2002, Nature.

[151]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[152]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[153]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[154]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[155]  M. Budisteanu,et al.  MICROARRAY-BASED COMPARATIVE GENOMIC HYBRIDIZATION (aCGH) - BETWEEN BASIC RESEARCH AND CLINICAL DIAGNOSTIC , 2010 .

[156]  J. Cheville,et al.  Predicting prostate carcinoma volume and stage at radical prostatectomy by assessing needle biopsy specimens for percent surface area and cores positive for carcinoma, perineural invasion, Gleason score, DNA ploidy and proliferation, and preoperative serum prostate specific antigen , 2001 .

[157]  A. Knudson,et al.  Mechanism and relevance of ploidy in neuroblastoma , 2000, Genes, chromosomes & cancer.

[158]  B. Heinmiller,et al.  The 15-Country Collaborative Study of Cancer Risk among Radiation Workers in the Nuclear Industry: Estimates of Radiation-Related Cancer Risks , 2007, Radiation research.

[159]  Wonshik Han,et al.  CAMK1D amplification implicated in epithelial–mesenchymal transition in basal‐like breast cancer , 2008, Molecular oncology.

[160]  W. McGuire,et al.  Human breast cancer: correlation of relapse and survival with amplification of the HER-2/neu oncogene. , 1987, Science.

[161]  N. Hanna,et al.  EGF Receptor Gene Mutations Are Common in Lung Cancers From “Never Smokers” and Are Associated With Sensitivity of Tumors to Gefitinib and Erlotinib , 2006 .

[162]  Gregory R. Grant,et al.  Analysis and Management of Microarray Gene Expression Data , 2007, Current protocols in molecular biology.

[163]  Wessel N. van Wieringen,et al.  CGHregions: Dimension Reduction for Array CGH Data with Minimal Information Loss , 2007 .

[164]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[165]  J. Cigudosa,et al.  Chromosomal and gene amplification in diffuse large B-cell lymphoma. , 1998, Blood.

[166]  M. Nakazawa The prognostic significance of DNA ploidy for neuroblastoma , 2004, Surgery Today.

[167]  Kathleen R. Cho,et al.  Ovarian cancer. , 2009, Annual review of pathology.

[168]  Hongyu Zhao,et al.  Building pathway clusters from Random Forests classification using class votes , 2008, BMC Bioinformatics.

[169]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[170]  Graham A. Colditz,et al.  Defining breast cancer prognosis based on molecular phenotypes: results from a large cohort study , 2011, Breast Cancer Research and Treatment.

[171]  J. Fridlyand,et al.  Deletion of chromosome 11q predicts response to anthracycline-based chemotherapy in early breast cancer. , 2007, Cancer research.

[172]  Wonshik Han,et al.  Genomic alterations identified by array comparative genomic hybridization as prognostic markers in tamoxifen-treated estrogen receptor-positive breast cancer , 2006, BMC Cancer.

[173]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[174]  D. Pinkel,et al.  Comparative Genomic Hybridization for Molecular Cytogenetic Analysis of Solid Tumors , 2022 .

[175]  Eytan Domany,et al.  Relationship of gene expression and chromosomal abnormalities in colorectal cancer. , 2006, Cancer research.

[176]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[177]  V. Moreno,et al.  Redefining the Significance of Aneuploidy in the Prognostic Assessment of Colorectal Cancer , 2001, Laboratory Investigation.

[178]  L L Hsiao,et al.  Correcting for signal saturation errors in the analysis of microarray data. , 2002, BioTechniques.

[179]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[180]  P. V. van Diest,et al.  In lymph node‐negative invasive breast carcinomas, specific chromosomal aberrations are strongly associated with high mitotic activity and predict outcome more accurately than grade, tumour diameter, and oestrogen receptor , 2003, The Journal of pathology.

[181]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[182]  D.,et al.  Regression Models and Life-Tables , 2022 .

[183]  Domenico Trombetta,et al.  Gene amplification as double minutes or homogeneously staining regions in solid tumors: origin and structure. , 2010, Genome research.

[184]  F. Craig Lymphoid neoplasms associated with concurrent t(14;18) and 8q24/c-MYC translocation generally have a poor prognosis , 2007 .

[185]  J. Polzehl,et al.  Local likelihood modeling by adaptive weights smoothing , 2004 .

[186]  M. Teixeira,et al.  Frequency of NUP98-NSD1 fusion transcript in childhood acute myeloid leukaemia , 2003, Leukemia.

[187]  J. Reis-Filho,et al.  Influence of whole arm loss of chromosome 16q on gene expression patterns in oestrogen receptor‐positive, invasive breast cancer , 2011, The Journal of pathology.

[188]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[189]  J. Herman,et al.  Silencing of the VHL tumor-suppressor gene by DNA methylation in renal carcinoma. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[190]  Philippe Dessen,et al.  Molecular Characterization of Breast Cancer with High-Resolution Oligonucleotide Comparative Genomic Hybridization Array , 2009, Clinical Cancer Research.

[191]  Walter L. Ruzzo,et al.  Improved Gene Selection for Classification of Microarrays , 2002, Pacific Symposium on Biocomputing.

[192]  Hiroyuki Shimada,et al.  Chromosome 1p and 11q deletions and outcome in neuroblastoma. , 2005, The New England journal of medicine.

[193]  Paul H. C. Eilers,et al.  Quantile smoothing of array CGH data , 2005, Bioinform..

[194]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[195]  M. McVey,et al.  MMEJ repair of double-strand breaks (director's cut): deleted sequences and alternative endings. , 2008, Trends in genetics : TIG.

[196]  Trevor Hastie,et al.  Averaged gene expressions for regression. , 2007, Biostatistics.

[197]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[198]  F. Berthold,et al.  Oligonucleotide array‐based comparative genomic hybridization (aCGH) of 90 neuroblastomas reveals aberration patterns closely associated with relapse pattern and outcome , 2006, Genes, chromosomes & cancer.

[199]  D. Slamon,et al.  Biological rationale for HER2/neu (c-erbB2) as a target for monoclonal antibody therapy. , 2000, Seminars in oncology.

[200]  C Caldas,et al.  Using array-comparative genomic hybridization to define molecular portraits of primary breast cancers , 2007, Oncogene.

[201]  G. Getz,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2005, Breast Cancer Research.

[202]  A. Nobel,et al.  Concordance among Gene-Expression – Based Predictors for Breast Cancer , 2011 .

[203]  M. Chiriva-Internati,et al.  Influence of obesity on the risk of developing colon cancer , 2005, Gut.

[204]  J. Schramm,et al.  Optimization of quantitative MGMT promoter methylation analysis using pyrosequencing and combined bisulfite restriction analysis. , 2007, The Journal of molecular diagnostics : JMD.

[205]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[206]  Ajay N. Jain,et al.  Breast tumor copy number aberration phenotypes and genomic instability , 2006, BMC Cancer.

[207]  A. Kallioniemi CGH microarrays and cancer. , 2008, Current opinion in biotechnology.

[208]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[209]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[210]  Hieu T. Do,et al.  Demonstration of a genetic therapeutic index for tumors expressing oncogenic BRAF by the kinase inhibitor SB-590885. , 2006, Cancer research.

[211]  J Schumann,et al.  Cellular DNA content as a marker of neoplasia in man. , 1980, The American journal of medicine.

[212]  J L Pujol,et al.  Aneuploidy and prognosis of non-small-cell lung cancer: a meta-analysis of published data , 2001, British Journal of Cancer.

[213]  Robert L. Sutherland,et al.  Cyclin D1, EMS1 and 11q13 Amplification in Breast Cancer , 2003, Breast Cancer Research and Treatment.

[214]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[215]  John N. Weinstein,et al.  Framework for Identifying Common Aberrations in DNA Copy Number Data , 2007, RECOMB.

[216]  M. Gonsebatt,et al.  Tetraploidy and chromosomal instability are early events during cervical carcinogenesis. , 2006, Carcinogenesis.

[217]  G. Daley,et al.  Mechanisms and implications of imatinib resistance mutations in BCR-ABL , 2004, Current opinion in hematology.

[218]  Barbara J. Trask,et al.  Array Comparative Genomic Hybridization Analysis of Genomic Alterations in Breast Cancer Subtypes , 2004, Cancer Research.

[219]  J. Rowley,et al.  Chromatin structural elements and chromosomal translocations in leukemia. , 2006, DNA repair.

[220]  H. Döhner,et al.  Matrix‐based comparative genomic hybridization: Biochips to screen for genomic imbalances , 1997, Genes, chromosomes & cancer.

[221]  T. Dryja,et al.  Allele-specific hypermethylation of the retinoblastoma tumor-suppressor gene. , 1991, American journal of human genetics.

[222]  D. Louis,et al.  PTEN mutations in gliomas and glioneuronal tumors , 1998, Oncogene.

[223]  P. Okatenko,et al.  Lung cancer mortality among nuclear workers of the Mayak facilities in the former Soviet Union , 2003, Radiation research.

[224]  Emmanuel Barillot,et al.  Analysis of array CGH data: from signal ratio to gain and loss of DNA regions , 2004, Bioinform..

[225]  Sanjay Ranka,et al.  Classification and feature selection algorithms for multi-class CGH data , 2008, ISMB.

[226]  Ivan Ovcharenko,et al.  Comparative analysis of chicken chromosome 28 provides new clues to the evolutionary fragility of gene-rich vertebrate regions. , 2007, Genome research.

[227]  J. Davison,et al.  Genomic differences between estrogen receptor (ER)‐positive and ER‐negative human breast carcinoma identified by single nucleotide polymorphism array comparative genome hybridization analysis , 2011, Cancer.

[228]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[229]  P. Leder,et al.  Translocation of the c-myc gene into the immunoglobulin heavy chain locus in human Burkitt lymphoma and murine plasmacytoma cells. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[230]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[231]  L. Chin,et al.  High-resolution characterization of the pancreatic adenocarcinoma genome , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[232]  Marie-France Sagot,et al.  Analysis of fine-scale mammalian evolutionary breakpoints provides new insight into their relation to genome organisation , 2009 .

[233]  J. Peterse,et al.  Prediction of BRCA1-association in hereditary non-BRCA1/2 breast carcinomas with array-CGH , 2008, Breast Cancer Research and Treatment.

[234]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[235]  S. Shah,et al.  Computational methods for identification of recurrent copy number alteration patterns by array CGH , 2009, Cytogenetic and Genome Research.

[236]  Jane Fridlyand,et al.  High-resolution analysis of DNA copy number alterations in colorectal cancer by array-based comparative genomic hybridization. , 2004, Carcinogenesis.

[237]  Jane Fridlyand,et al.  Bladder Cancer Stage and Outcome by Array-Based Comparative Genomic Hybridization , 2005, Clinical Cancer Research.

[238]  J. Lupski,et al.  Mechanisms of change in gene copy number , 2009, Nature Reviews Genetics.

[239]  R. Houlston,et al.  Association between chromosomal instability and prognosis in colorectal cancer: a meta-analysis , 2008, Gut.

[240]  R. Koenker,et al.  Regression Quantiles , 2007 .

[241]  D. Pellman,et al.  From polyploidy to aneuploidy, genome instability and cancer , 2004, Nature Reviews Molecular Cell Biology.

[242]  Yusuke Nakamura,et al.  Allelic Loss on Chromosome 9q Is Associated with Lymph Node Metastasis of Primary Breast Cancer , 1998, Japanese journal of cancer research : Gann.

[243]  Nazneen Rahman,et al.  Generation of trisomies in cancer cells by multipolar mitosis and incomplete cytokinesis , 2010, Proceedings of the National Academy of Sciences.

[244]  François Bourguignon,et al.  Decomposable Income Inequality Measures , 1979 .

[245]  Kesheng Wang,et al.  A Bayesian segmentation approach to ascertain copy number variations at the population level , 2009, Bioinform..

[246]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[247]  Kevin P. Murphy,et al.  Model-based clustering of array CGH data , 2009, Bioinform..

[248]  Shou-Jiang Gao,et al.  Viruses and human cancer: from detection to causality. , 2011, Cancer letters.

[249]  Jeroen de Ridder,et al.  Identification of cancer genes using a statistical framework for multiexperiment analysis of nondiscretized array CGH data , 2008, Nucleic acids research.

[250]  H. Varmus,et al.  Amplification of N-myc in untreated human neuroblastomas correlates with advanced disease stage. , 1984, Science.

[251]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[252]  F. Toledo,et al.  The origin of chromosome rearrangements at early stages of AMPD2 gene amplification in Chinese hamster cells , 1993, Current Biology.

[253]  H. Lodish Molecular Cell Biology , 1986 .

[254]  Kenny Q. Ye,et al.  Novel patterns of genome rearrangement and their association with survival in breast cancer. , 2006, Genome research.

[255]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[256]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[257]  Franck Picard,et al.  A statistical approach for array CGH data analysis , 2005, BMC Bioinformatics.

[258]  J. Murnane,et al.  DNA amplification by breakage/fusion/bridge cycles initiated by spontaneous telomere loss in a human cancer cell line. , 2002, Neoplasia.

[259]  C. Sawyers,et al.  Activity of a specific inhibitor of the BCR-ABL tyrosine kinase in the blast crisis of chronic myeloid leukemia and acute lymphoblastic leukemia with the Philadelphia chromosome. , 2001, The New England journal of medicine.