DeepYY1: a deep learning approach to identify YY1-mediated chromatin loops

The protein Yin Yang 1 (YY1) could form dimers that facilitate the interaction between active enhancers and promoter-proximal elements. YY1-mediated enhancer-promoter interaction is the general feature of mammalian gene control. Recently, some computational methods have been developed to characterize the interactions between DNA elements by elucidating important features of chromatin folding; however, no computational methods have been developed for identifying the YY1-mediated chromatin loops. In this study, we developed a deep learning algorithm named DeepYY1 based on word2vec to determine whether a pair of YY1 motifs would form a loop. The proposed models showed a high prediction performance (AUCs$\ge$0.93) on both training datasets and testing datasets in different cell types, demonstrating that DeepYY1 has an excellent performance in the identification of the YY1-mediated chromatin loops. Our study also suggested that sequences play an important role in the formation of YY1-mediated chromatin loops. Furthermore, we briefly discussed the distribution of the replication origin site in the loops. Finally, a user-friendly web server was established, and it can be freely accessed at http://lin-group.cn/server/DeepYY1.

[1]  Hui Yang,et al.  iDNA-MS: An Integrated Computational Tool for Detecting DNA Modification Sites in Multiple Genomes , 2020, iScience.

[2]  Yubang Gao,et al.  Single-molecule Real-time (SMRT) Isoform Sequencing (Iso-Seq) in Plants: The Status of the Bioinformatics Tools to Unravel the Transcriptome Complexity , 2019, Current Bioinformatics.

[3]  D. Gilbert,et al.  Control of DNA replication timing in the 3D genome , 2019, Nature Reviews Molecular Cell Biology.

[4]  Ruochi Zhang,et al.  Exploiting sequence-based features for predicting enhancer–promoter interactions , 2017, Bioinform..

[5]  Simon Zhongyuan Tian,et al.  Long-read ChIA-PET for base-pair-resolution mapping of haplotype-specific chromatin interactions , 2017, Nature Protocols.

[6]  Hao Lv,et al.  Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique , 2018, Bioinform..

[7]  P. Hamet,et al.  Artificial intelligence in medicine. , 2017, Metabolism: Clinical and Experimental.

[8]  Daniel S. Day,et al.  YY1 Is a Structural Regulator of Enhancer-Promoter Loops , 2017, Cell.

[9]  Yee Whye Teh,et al.  DeepC: Predicting chromatin interactions using megabase scaled deep neural networks and transfer learning , 2019, bioRxiv.

[10]  D. Waxman,et al.  Computational prediction of CTCF/cohesin-based intra-TAD loops that insulate chromatin contacts and gene expression in mouse liver , 2018, eLife.

[11]  D. Czajkowsky,et al.  Sub-kb Hi-C in D. melanogaster reveals conserved characteristics of TADs between insect and mammalian cells , 2018, Nature Communications.

[12]  Dariusz M Plewczynski,et al.  Three-dimensional Epigenome Statistical Model: Genome-wide Chromatin Looping Prediction , 2018, Scientific Reports.

[13]  Juan Carlos Rivera-Mulia,et al.  Replication Domains: Genome Compartmentalization into Functional Replication Units. , 2017, Advances in experimental medicine and biology.

[14]  Barnabás Póczos,et al.  Predicting enhancer-promoter interaction from genomic sequence with deep neural networks , 2016, bioRxiv.

[15]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[16]  Nancy A Obuchowski,et al.  Receiver operating characteristic (ROC) curves: review of methods with applications in diagnostic medicine , 2018, Physics in medicine and biology.

[17]  Jian Ma,et al.  Predicting CTCF-mediated chromatin loops using CTCF-MP , 2018, bioRxiv.

[18]  Michael Q. Zhang,et al.  CRISPR Inversion of CTCF Sites Alters Genome Topology and Enhancer/Promoter Function , 2015, Cell.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[21]  Bin Liu,et al.  A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods , 2019, Current Bioinformatics.

[22]  Jian Huang,et al.  A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization , 2019, Current Bioinformatics.

[23]  Yuefeng Ji,et al.  Contextual Bag-of-Words for Robust Visual Tracking , 2018, IEEE Transactions on Image Processing.

[24]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[25]  Hong-Dong Li,et al.  Analysis of Single-Cell RNA-seq Data by Clustering Approaches , 2019, Current Bioinformatics.

[26]  Chia-Feng Lu,et al.  Machine Learning–Based Radiomics for Molecular Subtyping of Gliomas , 2018, Clinical Cancer Research.

[27]  María P. Trujillo,et al.  Transfer learning for classification of cardiovascular tissues in histological images , 2018, Comput. Methods Programs Biomed..

[28]  Rongqin Li,et al.  Superresolution imaging reveals spatiotemporal propagation of human replication foci mediated by CTCF-organized chromatin structures , 2020, Proceedings of the National Academy of Sciences.

[29]  David J. Arenillas,et al.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles , 2015, Nucleic Acids Res..

[30]  Rui Jiang,et al.  EnContact: predicting enhancer-enhancer contacts using sequence-based deep learning model , 2019, PeerJ.

[31]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[32]  Yoshimasa Tsuruoka,et al.  [Deep Learning and Natural Language Processing]. , 2019, Brain and nerve = Shinkei kenkyu no shinpo.

[33]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[34]  Weiqun Peng,et al.  Predicting CTCF-mediated chromatin interactions by integrating genomic and epigenomic features , 2017, Nature Communications.

[35]  P. A. Lyakhov,et al.  Application of the residue number system to reduce hardware costs of the convolutional neural network implementation , 2020, Math. Comput. Simul..

[36]  Wei Wang,et al.  Constructing 3D interaction maps from 1D epigenomes , 2016, Nature Communications.

[37]  Shuangshuang Cui,et al.  Using Naive Bayes Classifier to predict osteonecrosis of the femoral head with cannulated screw fixation. , 2018, Injury.

[38]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[39]  Ricardo Cao,et al.  ROC Curves for the Statistical Analysis of Microarray Data. , 2019, Methods in molecular biology.

[40]  Yung-Yao Chen,et al.  Design and Implementation of Cloud Analytics-Assisted Smart Power Meters Considering Advanced Artificial Intelligence as Edge Analytics in Demand-Side Management for Smart Homes , 2019, Sensors.

[41]  Zhonghui Tang,et al.  Methods for comparative ChIA-PET and Hi-C data analysis. , 2019, Methods.

[42]  K. Pollard,et al.  Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin , 2016, Nature Genetics.

[43]  J. Dekker,et al.  Hi-C: a comprehensive technique to capture the conformation of genomes. , 2012, Methods.

[44]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[45]  S. Huerta-Yepez,et al.  Role of Yin Yang-1 (YY1) in the transcription regulation of the multi-drug resistance (MDR1) gene , 2018, Leukemia & lymphoma.

[46]  Anna Goldenberg,et al.  TensorFlow: Biology's Gateway to Deep Learning? , 2016, Cell systems.

[47]  Howard Y. Chang,et al.  HiChIP: efficient and sensitive analysis of protein-directed genome architecture , 2016, Nature Methods.

[48]  Isaac S Kohane,et al.  Artificial Intelligence in Healthcare , 2019, Artificial Intelligence and Machine Learning for Business for Non-Engineers.

[49]  Benjamin Bonavida,et al.  Yin Yang 1 is associated with cancer stem cell transcription factors (SOX2, OCT4, BMI1) and clinical implication , 2016, Journal of experimental & clinical cancer research : CR.

[50]  Geoffrey I. Webb,et al.  iLearn : an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data , 2019, Briefings Bioinform..

[51]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[52]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[53]  Daniel Rico,et al.  Cohesin organizes chromatin loops at DNA replication factories. , 2010, Genes & development.

[54]  Hao Lin,et al.  iDNA6mA-Rice: A Computational Tool for Detecting N6-Methyladenine Sites in Rice , 2019, Front. Genet..

[55]  Fu-Ying Dao,et al.  A computational platform to identify origins of replication sites in eukaryotes , 2020, Briefings Bioinform..

[56]  Feng Gao,et al.  DeOri: a database of eukaryotic DNA replication origins , 2012, Bioinform..