Intra-Cluster Distance Minimization in DNA Methylation Analysis Using an Advanced Tabu-Based Iterative $k$k-Medoids Clustering Algorithm (T-CLUST)

Recent advances in DNA methylation profiling have paved the way for understanding the underlying epigenetic mechanisms of various diseases such as cancer. While conventional distance-based clustering algorithms (e.g., hierarchical and <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="celik-ieq2-2886006.gif"/></alternatives></inline-formula>-means clustering) have been heavily used in such profiling owing to their speed in conduct of high-throughput analysis, these methods commonly converge to suboptimal solutions and/or trivial clusters due to their greedy search nature. Hence, methodologies are needed to improve the quality of clusters formed by these algorithms without sacrificing from their speed. In this study, we introduce three related algorithms for a complete high-throughput methylation analysis: a variance-based dimension reduction algorithm to handle high-dimensionality in data, an outlier detection algorithm to identify the outliers of data, and an advanced Tabu-based iterative <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="celik-ieq3-2886006.gif"/></alternatives></inline-formula>-medoids clustering algorithm (T-CLUST) to reduce the impact of initial solutions on the performance of conventional <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="celik-ieq4-2886006.gif"/></alternatives></inline-formula>-medoids algorithm. The performance of the proposed algorithms is demonstrated on nine different real DNA methylation datasets obtained from the Gene Expression Omnibus DataSets database. The accuracy of the cluster identification obtained by our proposed algorithms is higher than those of hierarchical and <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="celik-ieq5-2886006.gif"/></alternatives></inline-formula>-means clustering, as well as the conventional methods. The algorithms are implemented in MATLAB, and available at: <uri>http://www.coe.miami.edu/simlab/tclust.html</uri>.

[1]  Chidchanok Lursinsap,et al.  New feature selection for gene expression classification based on degree of class overlap in principal dimensions , 2015, Comput. Biol. Medicine.

[2]  M. Esteller Epigenetics in cancer. , 2008, The New England journal of medicine.

[3]  Keunsoo Kang,et al.  A Genome-Wide Methylation Approach Identifies a New Hypermethylated Gene Panel in Ulcerative Colitis , 2016, International journal of molecular sciences.

[4]  M. Caligiuri,et al.  Aberrant CpG-island methylation has non-random and tumour-type–specific patterns , 2000, Nature Genetics.

[5]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[6]  J. Herman,et al.  A gene hypermethylation profile of human cancer. , 2001, Cancer research.

[7]  Kenichiro Hata,et al.  DNA Methylation Profile Distinguishes Clear Cell Sarcoma of the Kidney from Other Pediatric Renal Tumors , 2013, PloS one.

[8]  Rudolf Jaenisch,et al.  Role for DNA methylation in genomic imprinting , 1993, Nature.

[9]  Johan Staaf,et al.  Molecular subtypes of breast cancer are associated with characteristic DNA methylation patterns , 2010, Breast Cancer Research.

[10]  J. Herman,et al.  5′ CpG island methylation is associated with transcriptional silencing of the tumour suppressor p16/CDKN2/MTS1 in human cancers , 1995, Nature Medicine.

[11]  E. Houseman,et al.  Model-Based Clustering of DNA Methylation Array Data , 2015 .

[12]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[13]  Devin C Koestler,et al.  A recursively partitioned mixture model for clustering time-course gene expression data. , 2014, Translational cancer research.

[14]  I. Jolliffe Principal Component Analysis , 2005 .

[15]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[16]  Peng Huang,et al.  Genome-wide methylation analysis identifies genes specific to breast cancer hormone receptor status and risk of recurrence. , 2011, Cancer research.

[17]  Junfeng Xia,et al.  Cancer Subtype Discovery Based on Integrative Model of Multigenomic Data , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  M. J. van der Laan,et al.  A new partitioning around medoids algorithm , 2003 .

[19]  Constanze Zeller,et al.  The DNA methylomes of serous borderline tumors reveal subgroups with malignant- or benign-like profiles. , 2013, The American journal of pathology.

[20]  Peter W. Laird,et al.  A comparison of cluster analysis methods using DNA methylation data , 2004, Bioinform..

[21]  Rafael A. Irizarry,et al.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays , 2014, Bioinform..

[22]  Peter A. Jones,et al.  Epigenetics in human disease and prospects for epigenetic therapy , 2004, Nature.

[23]  Peter W. Laird,et al.  THE ROLE OF DNA METHYLATION IN CANCER GENETICS AND EPIGENETICS , 1996 .

[24]  Ting Chen,et al.  Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  P. Rousseeuw,et al.  Partitioning Around Medoids (Program PAM) , 2008 .

[26]  E. Lander,et al.  The Mammalian Epigenome , 2007, Cell.

[27]  Francine E. Garrett-Bakelman,et al.  methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles , 2012, Genome Biology.

[28]  Ru-Fang Yeh,et al.  Differentiation of lung adenocarcinoma, pleural mesothelioma, and nonmalignant pulmonary tissues using DNA methylation profiles. , 2009, Cancer research.

[29]  Yufei Huang,et al.  A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles , 2012, BMC Genomics.

[30]  Göran Roos,et al.  Promoter DNA Methylation Pattern Identifies Prognostic Subgroups in Childhood T-Cell Acute Lymphoblastic Leukemia , 2013, PloS one.

[31]  John N Weinstein,et al.  Tumor Subtype-Specific Cancer–Testis Antigens as Potential Biomarkers and Immunotherapeutic Targets for Cancers , 2013, Cancer Immunology Research.

[32]  Xinhui Wang,et al.  Non-specific filtering of beta-distributed data , 2014, BMC Bioinformatics.

[33]  Artem Prokhorov,et al.  Copula based factorization in Bayesian multivariate infinite mixture models , 2013, J. Multivar. Anal..

[34]  P. Laird,et al.  Hierarchical clustering of lung cancer cell lines using DNA methylation markers. , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[35]  M. Ehrlich,et al.  Comparison of bisulfite modification of 5-methyldeoxycytidine and deoxycytidine residues. , 1980, Nucleic acids research.

[36]  Pritha Mahata,et al.  Exploratory Consensus of Hierarchical Clusterings for Melanoma and Breast Cancer , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[37]  Haluk Damgacioglu,et al.  Dynamic Data Driven Application Systems for Identification of Biomarkers in DNA Methylation , 2018, Handbook of Dynamic Data Driven Applications Systems.

[38]  M. Esteller,et al.  Downregulation of miR-130b~301b cluster is mediated by aberrant promoter methylation and impairs cellular senescence in prostate cancer , 2017, Journal of Hematology & Oncology.

[39]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[40]  K. Hansen,et al.  Functional normalization of 450k methylation array data improves replication in large cancer studies , 2014, Genome Biology.