Cluster analysis for DNA methylation profiles having a detection threshold

BackgroundDNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generated on a continuous scale, have a large number of 0 values. This suggests that conventional clustering methodology may not perform well on this data.ResultsWe compare performance of existing methodology (such as k-means) with two novel methods that explicitly allow for the preponderance of values at 0. We also consider how the ability to successfully cluster such data depends upon the number of informative genes for which methylation is measured and the correlation structure of the methylation values for those genes. We show that when data is collected for a sufficient number of genes, our models do improve clustering performance compared to methods, such as k-means, that do not explicitly respect the supposed biological realities of the situation.ConclusionThe performance of analysis methods depends upon how well the assumptions of those methods reflect the properties of the data being analyzed. Differing technologies will lead to data with differing properties, and should therefore be analyzed differently. Consequently, it is prudent to give thought to what the properties of the data are likely to be, and which analysis method might therefore be likely to best capture those properties.

[1]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[2]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[3]  P. Laird,et al.  MethyLight: a high-throughput assay to measure DNA methylation. , 2000, Nucleic acids research.

[4]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[5]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[6]  P. Laird Early detection: The power and the promise of DNA methylation markers , 2003, Nature Reviews Cancer.

[7]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[8]  P. Laird,et al.  Hierarchical clustering of lung cancer cell lines using DNA methylation markers. , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[9]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[10]  P. Laird,et al.  CpG island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with BRAF mutation in colorectal cancer , 2006, Nature Genetics.

[11]  Steve Horvath,et al.  Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma , 2005, Modern Pathology.

[12]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[13]  D. Covell,et al.  Molecular classification of cancer: unsupervised self-organizing map analysis of gene expression microarray data. , 2003, Molecular cancer therapeutics.

[14]  J. Herman,et al.  CpG island methylator phenotype in colorectal cancer. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Remco Dijkman,et al.  Epigenetic profiling of cutaneous T-cell lymphoma: promoter hypermethylation of multiple tumor suppressor genes including BCL7a, PTPRG, and p73. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[16]  Peter W. Laird,et al.  A comparison of cluster analysis methods using DNA methylation data , 2004, Bioinform..

[17]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[18]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[19]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[20]  P. Laird,et al.  CpG Island Methylator Phenotype in Human Colorectal Cancer Is Tightly Associated with BRAF Mutation and Underlies Sporadic Mismatch Repair Deficiency. , 2006 .

[21]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Cluster Analysis , 1999 .