DeepEBV: a deep learning model to predict Epstein-Barr virus (EBV) integration sites

MOTIVATION Epstein-Barr virus (EBV) is one of the most prevalent DNA oncogenic viruses. The integration of EBV into the host genome has been reported to play an important role in cancer development. The preference of EBV integration showed strong dependence on the local genomic environment, which enables the prediction of EBV integration sites. RESULTS An attention-based deep learning model, DeepEBV, was developed to predict EBV integration sites by learning local genomic features automatically. First, DeepEBV was trained and tested using the data from the dsVIS database. The results showed that DeepEBV with EBV integration sequences plus Repeat peaks and 2 fold data augmentation performed the best on the training dataset. Furthermore, the performance of the model was validated in an independent dataset. In addition, the motifs of DNA-binding proteins could influence the selection preference of viral insertional mutagenesis. Furthermore, the results showed that DeepEBV can predict EBV integration hotspot genes accurately. In summary, DeepEBV is a robust, accurate and explainable deep learning model, providing novel insights into EBV integration preferences and mechanisms. AVAILABILITY DeepEBV is available as open-source software and can be downloaded from https://github.com/JiuxingLiang/DeepEBV.gitSupplementary information  Supplementary data are available at Bioinformatics online.

[1]  P. Moore,et al.  Why do viruses cause cancer? Highlights of the first century of human tumour virology , 2010, Nature Reviews Cancer.

[2]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[3]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[4]  Z. Zeng,et al.  Genome-wide Analysis of Epstein-Barr Virus (EBV) Integration and Strain in C666-1 and Raji Cells , 2016, Journal of Cancer.

[5]  K. Aozasa,et al.  Identification of Epstein-Barr virus integrated sites in lymphoblastoid cell line (IB4). , 2005, Virus research.

[6]  Bo W. Han,et al.  Genomic and transcriptomic landscapes of Epstein-Barr virus in extranodal natural killer T-cell lymphoma , 2018, Leukemia.

[7]  W. Jia,et al.  Genome-wide profiling of Epstein-Barr virus integration by targeted sequencing in Epstein-Barr virus associated malignancies , 2019, Theranostics.

[8]  Yuchuan Wang,et al.  Predicting disease-associated mutation of metal-binding sites in proteins using a deep learning approach , 2019, Nature Machine Intelligence.

[9]  Quanshi Zhang,et al.  Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[10]  Elnaz Jahani Heravi,et al.  Guide to Convolutional Neural Networks , 2017 .

[11]  K. Aozasa,et al.  Integration of Epstein-Barr virus into chromosome 6q15 of Burkitt lymphoma cell line (Raji) induces loss of BACH2 expression. , 2004, The American journal of pathology.

[12]  Christina Leslie,et al.  An atlas of the Epstein-Barr virus transcriptome and epigenome reveals host-virus regulatory interactions. , 2012, Cell host & microbe.

[13]  G. Chopra,et al.  Integrated pan-cancer map of EBV-associated neoplasms reveals functional host-virus interactions. , 2019, Cancer research.

[14]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[15]  M. Napierala,et al.  New insights into repeat instability , 2010, RNA biology.

[16]  M. Borowitz,et al.  Linkage between STAT Regulation and Epstein-Barr Virus Gene Expression in Tumors , 2001, Journal of Virology.

[17]  H. Yoshiyama,et al.  Epstein-Barr Virus (EBV)-associated Gastric Carcinoma , 2012, Viruses.

[18]  Y Ichioka,et al.  Parallel distributed processing model with local space-invariant interconnections and its optical architecture. , 1990, Applied optics.

[19]  Rui Tian,et al.  DeepHPV: a deep learning model to predict human papillomavirus integration sites , 2020, Briefings Bioinform..

[20]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[21]  Zhongming Zhao,et al.  VISDB: a manually curated database of viral integration sites in the human genome , 2019, Nucleic Acids Res..

[22]  S. Verma,et al.  Epstein-Barr Virus nuclear antigen 1 (EBNA1) confers resistance to apoptosis in EBV-positive B-lymphoma cells through up-regulation of survivin. , 2011, Virology.

[23]  Tao Jiang,et al.  DeepHINT: understanding HIV-1 integration via deep learning with attention , 2019, Bioinform..

[24]  Ilona Merikanto,et al.  Circadian clock disruptions and the risk of cancer , 2012, Annals of medicine.

[25]  H. Dosch,et al.  Sustained Expression of the Novel EBV-Induced Zinc Finger Gene, ZNFEB, Is Critical for the Transition of B Lymphocyte Activation to Oncogenic Growth Transformation1 , 2002, The Journal of Immunology.

[26]  Walter N. Moss,et al.  High-Throughput RNA Sequencing-Based Virome Analysis of 50 Lymphoma Cell Lines from the Cancer Cell Line Encyclopedia Project , 2014, Journal of Virology.

[27]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[28]  H. Yoshiyama,et al.  Clinical Importance of Epstein–Barr Virus-Associated Gastric Cancer , 2018, Cancers.

[29]  K. Aozasa,et al.  Epstein–Barr virus is integrated between REL and BCL-11A in American Burkitt lymphoma cell line (NAB-2) , 2004, Laboratory Investigation.

[30]  J. Lupski,et al.  The DNA replication FoSTeS/MMBIR mechanism can generate genomic, genic and exonic complex rearrangements in humans , 2009, Nature Genetics.