Dynamic Data Driven Application Systems for Identification of Biomarkers in DNA Methylation

The term ‘epigenetic’ refers to all heritable alterations that occur in a given gene function without having any change on the DeoxyriboNucleic Acid (DNA) sequence. Epigenetic modifications play a crucial role in development and differentiation of various diseases including cancer. The specific epigenetic alteration that has garnered a great deal of attention is DNA methylation, i.e., the addition of a methyl-group to cytosine. Recent studies have shown that different tumor types have distinct methylation profiles. Identifying idiosyncratic DNA methylation profiles of different tumor types and subtypes can provide invaluable insights for accurate diagnosis, early detection, and tailoring of the related treatment for cancer. In this study, our goal is to identify the informative genes (biomarkers) whose methylation level change correlates with a specific cancer type or subtype. To achieve this goal, we propose a novel high dimensional learning framework inspired by the dynamic data driven application systems paradigm to identify the biomarkers, determine the outlier(s) and improve the quality of the resultant disease detection. The proposed framework starts with a principal component analysis (PCA) followed by hierarchical clustering (HCL) of observations and determination of informative genes based on the HCL predictions. The capabilities and performance of the proposed framework are demonstrated using a DNA methylation dataset stored in Gene Expression Omnibus (GEO) DataSets on lung cancer. The preliminary results demonstrate that our framework outperforms the conventional clustering algorithms with embedded dimension reduction methods, in its efficiency to identify informative genes and outliers, and removal of their contaminating effects at the expense of reasonable computational cost.

[1]  Haluk Damgacioglu,et al.  A Dynamic Data-driven Approach for Operation Planning of Microgrids , 2015, ICCS.

[2]  Frederica Darema,et al.  Dynamic Data Driven Applications Systems: A New Paradigm for Application Simulations and Measurements , 2004, International Conference on Computational Science.

[3]  Peter W. Laird,et al.  THE ROLE OF DNA METHYLATION IN CANCER GENETICS AND EPIGENETICS , 1996 .

[4]  J. Herman,et al.  A gene hypermethylation profile of human cancer. , 2001, Cancer research.

[5]  Nurcin Celik,et al.  A DDDAMS framework for real-time load dispatching in power networks , 2013, 2013 Winter Simulations Conference (WSC).

[6]  Zoubin Ghahramani,et al.  Unifying linear dimensionality reduction , 2014, 1406.0873.

[7]  M. Esteller Epigenetics in cancer. , 2008, The New England journal of medicine.

[8]  Kenichiro Hata,et al.  DNA Methylation Profile Distinguishes Clear Cell Sarcoma of the Kidney from Other Pediatric Renal Tumors , 2013, PloS one.

[9]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10]  Seungho Lee,et al.  DDDAS-based multi-fidelity simulation framework for supply chain systems , 2010 .

[11]  Karsten Schwan,et al.  Dynamic Data Driven Application Simulation of Surface Transportation Systems , 2006, International Conference on Computational Science.

[12]  Rudolf Jaenisch,et al.  Role for DNA methylation in genomic imprinting , 1993, Nature.

[13]  Johan Staaf,et al.  Molecular subtypes of breast cancer are associated with characteristic DNA methylation patterns , 2010, Breast Cancer Research.

[14]  Nurcin Celik,et al.  System of Systems Modeling and Simulation for Microgrids Using DDDAMS , 2014 .

[15]  Peter W. Laird,et al.  A comparison of cluster analysis methods using DNA methylation data , 2004, Bioinform..

[16]  S. Devaskar,et al.  Epigenetics – A Science of Heritable Biological Adaptation , 2007, Pediatric Research.

[17]  Wei Jiang,et al.  High-throughput DNA methylation profiling using universal bead arrays. , 2006, Genome research.

[18]  Chun-Hung Chen,et al.  Dynamic Data Driven Adaptive Simulation Framework for Automated Control in Microgrids , 2017, IEEE Transactions on Smart Grid.

[19]  M. Ehrlich,et al.  Comparison of bisulfite modification of 5-methyldeoxycytidine and deoxycytidine residues. , 1980, Nucleic acids research.

[20]  Peter A. Jones,et al.  Epigenetics in human disease and prospects for epigenetic therapy , 2004, Nature.

[21]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[22]  P. Laird,et al.  Hierarchical clustering of lung cancer cell lines using DNA methylation markers. , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[23]  Cem Iyigun,et al.  Semi-supervised Probabilistic Distance Clustering and the Uncertainty of Classification , 2008, GfKl.

[24]  Jian Liu,et al.  A DDDAMS-based planning and control framework for surveillance and crowd control via UAVs and UGVs , 2013, Expert Syst. Appl..

[25]  Jie Xu,et al.  Dynamic data driven application systems for smart cities and urban infrastructures , 2016, 2016 Winter Simulation Conference (WSC).

[26]  Erik Blasch,et al.  Static Versus Dynamic Data Information Fusion Analysis Using DDDAS for Cyber Security Trust , 2014, ICCS.

[27]  Ru-Fang Yeh,et al.  Differentiation of lung adenocarcinoma, pleural mesothelioma, and nonmalignant pulmonary tissues using DNA methylation profiles. , 2009, Cancer research.