Enhanced performance of gene expression predictive models with protein-mediated spatial chromatin interactions

There have been multiple attempts to predict the expression of the genes based on the sequence, epigenetics, and various other factors. To improve those predictions, we have decided to investigate adding protein-specific 3D interactions that play a major role in the compensation of the chromatin structure in the cell nucleus. To achieve this, we have used the architecture of one of the state-of-the-art algorithms, ExPecto (J. Zhou et al., 2018), and investigated the changes in the model metrics upon adding the spatially relevant data. We have used ChIA-PET interactions that are mediated by cohesin (24 cell lines), CTCF (4 cell lines), and RNAPOL2 (4 cell lines). As the output of the study, we have developed the Spatial Gene Expression (SpEx) algorithm that shows statistically significant improvements in most cell lines.

[1]  P. Park,et al.  The 4D Nucleome Data Portal: a resource for searching and visualizing curated nucleomics data , 2021, bioRxiv.

[2]  J. Moffat,et al.  A cohesin traffic pattern genetically linked to gene regulation , 2021, bioRxiv.

[3]  U. Schmitz,et al.  CTCF as a regulator of alternative splicing: new tricks for an old player , 2021, Nucleic acids research.

[4]  David R. Kelley,et al.  Effective gene expression prediction from sequence by integrating long-range interactions , 2021, Nature Methods.

[5]  Judith B. Zaugg,et al.  Landscape of cohesin-mediated chromatin loops in the human genome , 2020, Nature.

[6]  Harianto Tjong,et al.  ChIA-PIPE: A fully automated pipeline for comprehensive ChIA-PET data analysis and visualization , 2020, Science Advances.

[7]  Michael Q. Zhang,et al.  The landscape of RNA polymerase II associated chromatin interactions in prostate cancer. , 2020, The Journal of clinical investigation.

[8]  A. S. Hansen,et al.  CTCF as a boundary factor for cohesin-mediated loop extrusion: evidence for a multi-step mechanism , 2020, Nucleus.

[9]  J. Peters,et al.  DNA loop extrusion by human cohesin , 2019, Science.

[10]  Bianca J. Diaz,et al.  Identification of Cancer Drivers at CTCF Insulators in 1,962 Whole Genomes. , 2019, Cell systems.

[11]  Chandra L. Theesfeld,et al.  Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk , 2018, Nature Genetics.

[12]  Cory Y. McLean,et al.  Sequential regulatory activity prediction across chromosomes with convolutional neural networks , 2017, bioRxiv.

[13]  Erez Lieberman Aiden,et al.  Cohesin Loss Eliminates All Loop Domains , 2017, Cell.

[14]  M. Martí-Renom,et al.  Distinct roles of cohesin-SA1 and cohesin-SA2 in 3D chromosome organization , 2017, bioRxiv.

[15]  Victor O. Leshyk,et al.  The 4D nucleome project , 2017, Nature.

[16]  L. Mirny,et al.  Formation of Chromosomal Domains in Interphase by Loop Extrusion , 2015, bioRxiv.

[17]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[18]  J. Gray,et al.  IL-10 and integrin signaling pathways are associated with head and neck cancer progression , 2016, BMC Genomics.

[19]  Dariusz M Plewczynski,et al.  CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription , 2015, Cell.

[20]  Michael Q. Zhang,et al.  CRISPR Inversion of CTCF Sites Alters Genome Topology and Enhancer/Promoter Function , 2015, Cell.

[21]  Jill M Dowen,et al.  Control of Cell Identity Genes Occurs in Insulated Neighborhoods in Mammalian Chromosomes , 2014, Cell.

[22]  Amy E. Knapp,et al.  High Frequency Strand Slippage Mutations in CTCF in MSI‐Positive Endometrial Cancers , 2014, Human mutation.

[23]  Jennifer E. Phillips-Cremins,et al.  Architectural Protein Subclasses Shape 3D Organization of Genomes during Lineage Commitment , 2013, Cell.

[24]  Wing-Kin Sung,et al.  ChIA-PET analysis of transcriptional chromatin interactions. , 2012, Methods.

[25]  Gabriele Gillessen-Kaesbach,et al.  HDAC8 mutations in Cornelia de Lange Syndrome affect the cohesin acetylation cycle , 2012, Nature.

[26]  J. Sedat,et al.  Spatial partitioning of the regulatory landscape of the X-inactivation centre , 2012, Nature.

[27]  J. Barbero,et al.  Dynamics of cohesin proteins REC8, STAG3, SMC1 beta and SMC3 are consistent with a role in sister chromatid cohesion during meiosis in human oocytes. , 2010, Human reproduction.

[28]  D. Birnbaum,et al.  Alteration of cohesin genes in myeloid diseases , 2010, American journal of hematology.

[29]  V. Corces,et al.  CTCF: Master Weaver of the Genome , 2009, Cell.

[30]  Lei Guo,et al.  Predicting Gene Expression from Sequence: A Reexamination , 2007, PLoS Comput. Biol..

[31]  S. Gygi,et al.  Recruitment of Xenopus Scc2 and cohesin to chromatin requires the pre-replication complex , 2004, Nature Cell Biology.

[32]  Danny Reinberg,et al.  Recent highlights of RNA-polymerase-II-mediated transcription. , 2004, Current opinion in cell biology.

[33]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[34]  R. Penzel,et al.  CTCF Gene Mutations in Invasive Ductal Breast Cancer , 2003, Breast Cancer Research and Treatment.

[35]  C. Scriver,et al.  The Metabolic and Molecular Bases of Inherited Disease, 8th Edition 2001 , 2001, Journal of Inherited Metabolic Disease.

[36]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[37]  A. Lindblom,et al.  A screen for germline mutations in the gene encoding CCCTC-binding factor (CTCF) in familial non-BRCA1/BRCA2 breast cancer , 2004, Breast Cancer Research.

[38]  G. Orphanides,et al.  A Unified Theory of Gene Expression , 2002, Cell.

[39]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[40]  C. Martínez-A,et al.  STAG3, a novel gene encoding a protein involved in meiotic chromosome pairing and location of STAG3‐related genes flanking the Williams‐Beuren syndrome deletion , 2000, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[41]  K. Nasmyth,et al.  Yeast cohesin complex requires a conserved protein, Eco1p(Ctf7), to establish cohesion between sister chromatids during DNA replication. , 1999, Genes & development.

[42]  V. Guacci,et al.  A Direct Link between Sister Chromatid Cohesion and Chromosome Condensation Revealed through the Analysis of MCD1 in S. cerevisiae , 1997, Cell.

[43]  K. Nasmyth,et al.  Cohesins: Chromosomal Proteins that Prevent Premature Separation of Sister Chromatids , 1997, Cell.

[44]  Y. Shav-Tal,et al.  SA-1, a nuclear protein encoded by one member of a novel gene family: molecular cloning and detection in hemopoietic organs. , 1997, Gene.

[45]  R. Conaway,et al.  General transcription factors for RNA polymerase II. , 1997, Progress in nucleic acid research and molecular biology.

[46]  T Lagrange,et al.  The general transcription factors of RNA polymerase II. , 1996, Genes & development.

[47]  A. Shilatifard,et al.  Transcription syndromes and the role of RNA polymerase II general transcription factors in human disease. , 1996, The Journal of clinical investigation.

[48]  D. Duan,et al.  Inhibition of transcription elongation by the VHL tumor suppressor protein , 1995, Science.

[49]  Herman Van den Berghe,et al.  Recurrent rearrangements in the high mobility group protein gene, HMGI-C, in benign mesenchymal tumours , 1995, Nature Genetics.

[50]  Y. Yazaki,et al.  Cloning of several species of MLL/MEN chimeric cDNAs in myeloid leukemia with t(11;19)(q23;p13.1) translocation. , 1995, Blood.

[51]  J. Rowley,et al.  Cloning of ELL, a gene that fuses to MLL in a t(11;19)(q23;p13.1) in acute myeloid leukemia. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[52]  B. Seizinger,et al.  Germ-line mutations in the von Hippel-Lindau tumor-suppressor gene are similar to somatic von Hippel-Lindau aberrations in sporadic renal cell carcinoma. , 1994, American journal of human genetics.

[53]  T. Rabbitts,et al.  Chromosomal translocations in human cancer , 1994, Nature.

[54]  T. Shuin,et al.  Somatic mutations of the von Hippel-Lindau tumor suppressor gene in sporadic central nervous system hemangioblastomas. , 1994, Cancer research.

[55]  W. Rutter,et al.  Multiple Forms of DNA-dependent RNA Polymerase in Eukaryotic Organisms , 1969, Nature.