Comparison of Clustering Methods for Investigation of Genome-Wide Methylation Array Data

The use of genome-wide methylation arrays has proved very informative to investigate both clinical and biological questions in human epigenomics. The use of clustering methods either for exploration of these data or to compare to an a priori grouping, e.g., normal versus disease allows assessment of groupings of data without user bias. However no consensus on the methods to use for clustering of methylation array approaches has been reached. To determine the most appropriate clustering method for analysis of illumina array methylation data, a collection of data sets was simulated and used to compare clustering methods. Both hierarchical clustering and non-hierarchical clustering methods (k-means, k-medoids, and fuzzy clustering algorithms) were compared using a range of distance and linkage methods. As no single method consistently outperformed others across different simulations, we propose a method to capture the best clustering outcome based on an additional measure, the silhouette width. This approach produced a consistently higher cluster accuracy compared to using any one method in isolation.

[1]  J. Bezdek Numerical taxonomy with fuzzy sets , 1974 .

[2]  B. Everitt,et al.  Cluster Analysis: Everitt/Cluster Analysis , 2011 .

[3]  Richard C. Larson,et al.  Facility Locations with the Manhattan Metric in the Presence of Barriers to Travel , 1983, Oper. Res..

[4]  R. Fishel,et al.  Genomic instability: first step to carcinogenesis. , 1999, Anticancer research.

[5]  Margaret R. Karagas,et al.  Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions , 2008, BMC Bioinformatics.

[6]  G. N. Lance,et al.  Computer Programs for Hierarchical Polythetic Classification ("Similarity Analyses") , 1966, Comput. J..

[7]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[8]  Hidetoshi Shimodaira,et al.  Pvclust: an R package for assessing the uncertainty in hierarchical clustering , 2006, Bioinform..

[9]  Jian-Bing Fan,et al.  GoldenGate assay for DNA methylation profiling. , 2009, Methods in molecular biology.

[10]  Weihua Chang,et al.  Whole-genome genotyping with the single-base extension assay , 2005, Nature Methods.

[11]  John C. Gower,et al.  Similarity, Dissimilarity, and Distance Measure , 2005 .

[12]  Ian M. Wilson,et al.  Chromosome-wide DNA methylation analysis predicts human tissue-specific X inactivation , 2011, Human Genetics.

[13]  Peter W. Laird,et al.  Cluster analysis for DNA methylation profiles having a detection threshold , 2006, BMC Bioinformatics.

[14]  A. Feinberg,et al.  Genome-wide methylation analysis of human colon cancer reveals similar hypo- and hypermethylation at conserved tissue-specific CpG island shores , 2008, Nature Genetics.

[15]  Jian-Bing Fan,et al.  Genome‐wide DNA methylation profiling , 2010, Wiley interdisciplinary reviews. Systems biology and medicine.

[16]  Brian Everitt,et al.  Measurement of Proximity , 2011 .

[17]  Xin Zhou,et al.  A statistical framework for Illumina DNA methylation arrays , 2010, Bioinform..

[18]  Devin C Koestler,et al.  Infant growth restriction is associated with distinct patterns of DNA methylation in human placentas , 2011, Epigenetics.

[19]  K. Gunderson,et al.  High density DNA methylation array with single CpG site resolution. , 2011, Genomics.

[20]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[21]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[22]  L. Gordon,et al.  Evidence for age-related and individual-specific changes in DNA methylation profile of mononuclear cells during early immune development in humans , 2011, Epigenetics.

[23]  Peter W. Laird,et al.  A comparison of cluster analysis methods using DNA methylation data , 2004, Bioinform..

[24]  P. Laird,et al.  MethyLight: a high-throughput assay to measure DNA methylation. , 2000, Nucleic acids research.

[25]  Richard D Emes,et al.  Quantitative, high-resolution epigenetic profiling of CpG loci identifies associations with cord blood plasma homocysteine and birth weight in humans , 2011, Epigenetics.

[26]  Xiao Zhang,et al.  Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis , 2010, BMC Bioinformatics.

[27]  Peng Huang,et al.  Genome-wide methylation analysis identifies genes specific to breast cancer hormone receptor status and risk of recurrence. , 2011, Cancer research.