Single base-pair resolution analysis of DNA binding motif with MoMotif reveals an oncogenic function of CTCF zinc-finger 1 mutation

Abstract Defining the impact of missense mutations on the recognition of DNA motifs is highly dependent on bioinformatic tools that define DNA binding elements. However, classical motif analysis tools remain limited in their capacity to identify subtle changes in complex binding motifs between distinct conditions. To overcome this limitation, we developed a new tool, MoMotif, that facilitates a sensitive identification, at the single base-pair resolution, of complex, or subtle, alterations to core binding motifs, discerned from ChIP-seq data. We employed MoMotif to define the previously uncharacterized recognition motif of CTCF zinc-finger 1 (ZF1), and to further define the impact of CTCF ZF1 mutation on its association with chromatin. Mutations of CTCF ZF1 are exclusive to breast cancer and are associated with metastasis and therapeutic resistance, but the underlying mechanisms are unclear. Using MoMotif, we identified an extension of the CTCF core binding motif, necessitating a functional ZF1 to bind appropriately. Using a combination of ChIP-Seq and RNA-Seq, we discover that the inability to bind this extended motif drives an altered transcriptional program associated with the oncogenic phenotypes observed clinically. Our study demonstrates that MoMotif is a powerful new tool for comparative ChIP-seq analysis and characterising DNA-protein contacts.

[1]  M. Kerin,et al.  Targeting stromal cell Syndecan‐2 reduces breast tumour growth, metastasis and limits immune evasion , 2020, International journal of cancer.

[2]  Andrew A. Hardigan,et al.  Occupancy maps of 208 chromatin-associated proteins in one human cell type , 2020, Nature.

[3]  M. Edmonson,et al.  Discovery of regulatory noncoding variants in individual cancer genomes by using cis-X , 2020, Nature Genetics.

[4]  E. Sokol,et al.  The genomic landscape of metastatic breast cancer: Insights from 11,000 tumors , 2020, PloS one.

[5]  Jesse R. Dixon,et al.  Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer , 2020, Nature Genetics.

[6]  Jun Yu,et al.  Analyses of non-coding somatic drivers in 2,658 cancer whole genomes , 2020, Nature.

[7]  Jonathan M. Mudge,et al.  Functional signatures of evolutionarily young CTCF binding sites , 2020, bioRxiv.

[8]  Phillip A. Richmond,et al.  JASPAR 2020: update of the open-access database of transcription factor binding profiles , 2019, Nucleic Acids Res..

[9]  D. Reinberg,et al.  RNA Interactions Are Essential for CTCF-Mediated Genome Organization. , 2019, Molecular cell.

[10]  J. Bushweller Targeting transcription factors in cancer — from undruggable to reality , 2019, Nature Reviews Cancer.

[11]  Steven Henikoff,et al.  Pioneer Factor-Nucleosome Binding Events during Differentiation Are Motif Encoded. , 2019, Molecular cell.

[12]  R. Tjian,et al.  Distinct Classes of Chromatin Loops Revealed by Deletion of an RNA-Binding Region in CTCF. , 2019, Molecular cell.

[13]  D. Odom,et al.  Clustered CTCF binding is an evolutionary mechanism to maintain topologically associating domains , 2019, Genome Biology.

[14]  Bianca J. Diaz,et al.  Identification of Cancer Drivers at CTCF Insulators in 1,962 Whole Genomes. , 2019, Cell systems.

[15]  Elzo de Wit,et al.  CTCF: a Swiss-army knife for genome organization and transcription regulation. , 2019, Essays in biochemistry.

[16]  Kellen G Cresswell,et al.  SpectralTAD: an R package for defining a hierarchy of topologically associated domains using spectral clustering , 2019, BMC Bioinformatics.

[17]  G. Ciriello,et al.  Comparison of computational methods for the identification of topologically associating domains , 2018, Genome Biology.

[18]  J. Mackey,et al.  Oncogenic activity of poly (ADP-ribose) glycohydrolase , 2018, Oncogene.

[19]  B. Taylor,et al.  The Genomic Landscape of Endocrine-Resistant Advanced Breast Cancers. , 2018, Cancer cell.

[20]  Shuxiang Ruan,et al.  Digital Commons@Becker , 2022 .

[21]  S. K. Zaidi,et al.  Intranuclear and higher‐order chromatin organization of the major histone gene cluster in breast cancer , 2018, Journal of cellular physiology.

[22]  Yanli Wang,et al.  Molecular mechanism of directional CTCF recognition of a diverse range of genomic sites , 2017, Cell Research.

[23]  Michael J. Guertin,et al.  Identification of breast cancer associated variants that modulate transcription factor binding , 2017, PLoS genetics.

[24]  Xiao-Tao Wang,et al.  HiTAD: detecting the structural and functional hierarchies of topologically associating domains from chromatin interactions , 2017, Nucleic acids research.

[25]  Matthew E. Gosden,et al.  Tissue-specific CTCF/Cohesin-mediated chromatin architecture delimits enhancer interactions and function in vivo , 2017, Nature Cell Biology.

[26]  Xiaodong Cheng,et al.  Structural Basis for the Versatile and Methylation-Dependent Binding of CTCF to DNA. , 2017, Molecular cell.

[27]  S. Richard,et al.  CTCF facilitates DNA double-strand break repair by enhancing homologous recombination repair , 2017, Science Advances.

[28]  E. Mardis,et al.  CTCF genetic alterations in endometrial carcinoma are pro-tumorigenic , 2017, Oncogene.

[29]  N. Jayaram,et al.  Evaluating tools for transcription factor binding site prediction , 2016, BMC Bioinformatics.

[30]  S. Slager,et al.  ChIP-seq in studying epigenetic mechanisms of disease and promoting precision medicine: progresses and future directions. , 2016, Epigenomics.

[31]  I. Goldstein,et al.  Steroid Receptors Reprogram FoxA1 Occupancy through Dynamic Chromatin Transitions , 2016, Cell.

[32]  Gene W. Yeo,et al.  Robust transcriptome-wide discovery of RNA binding protein binding sites with enhanced CLIP (eCLIP) , 2016, Nature Methods.

[33]  Aaron T. L. Lun,et al.  csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows , 2015, Nucleic acids research.

[34]  Gary D Stormo,et al.  DNA Motif Databases and Their Uses , 2015, Current protocols in bioinformatics.

[35]  Rongxin Fang,et al.  Functional diversity of CTCFs is encoded in their binding motifs , 2015, BMC Genomics.

[36]  William Stafford Noble,et al.  The MEME Suite , 2015, Nucleic Acids Res..

[37]  Kimberly D. Siegmund,et al.  Identifying differential transcription factor binding in ChIP-seq , 2015, Front. Genet..

[38]  Jennifer A. Erwin,et al.  Locus-specific targeting to the X chromosome revealed by the RNA interactome of CTCF. , 2015, Molecular cell.

[39]  Hongbing Shen,et al.  Systematical analyses of variants in CTCF-binding sites identified a novel lung cancer susceptibility locus among Chinese population , 2015, Scientific Reports.

[40]  Benjamin L. Oakes,et al.  A systematic survey of the Cys2His2 zinc finger DNA-binding landscape , 2015, Nucleic acids research.

[41]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[42]  Martha L. Bulyk,et al.  UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein–DNA interactions , 2014, Nucleic Acids Res..

[43]  Kate B. Cook,et al.  Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity , 2014, Cell.

[44]  Fidel Ramírez,et al.  deepTools: a flexible platform for exploring deep-sequencing data , 2014, Nucleic Acids Res..

[45]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[46]  Hong Gu,et al.  Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling , 2014, PloS one.

[47]  N. Mochizuki,et al.  The Secreted Protein ANGPTL2 Promotes Metastasis of Osteosarcoma Cells Through Integrin α5β1, p38 MAPK, and Matrix Metalloproteinases , 2014, Science Signaling.

[48]  Benjamin L. Oakes,et al.  Deep sequencing of large library selections allows computational discovery of diverse sets of zinc fingers that bind common targets , 2013, Nucleic acids research.

[49]  Victor V Lobanenkov,et al.  A genome-wide map of CTCF multivalency redefines the CTCF code. , 2013, Cell reports.

[50]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[51]  P. Span,et al.  Downregulation of Serine Protease HTRA1 Is Associated with Poor Survival in Breast Cancer , 2013, PloS one.

[52]  R. Young,et al.  Transcriptional Regulation and Its Misregulation in Disease , 2013, Cell.

[53]  Matthew T. Maurano,et al.  Widespread plasticity in CTCF occupancy linked to DNA methylation , 2012, Genome research.

[54]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[55]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[56]  Michael D. Wilson,et al.  Waves of Retrotransposon Expansion Remodel Genome Organization and CTCF Binding in Multiple Mammalian Lineages , 2012, Cell.

[57]  B. Pugh,et al.  Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution , 2011, Cell.

[58]  I. Ellis,et al.  Differential oestrogen receptor binding is associated with clinical outcome in breast cancer , 2011, Nature.

[59]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[60]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[61]  Victor X. Jin,et al.  Genomic Targets of the KRAB and SCAN Domain-containing Zinc Finger Protein 263* , 2009, The Journal of Biological Chemistry.

[62]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[63]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[64]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[65]  A. Thompson,et al.  The prolyl 3-hydroxylases P3H2 and P3H3 are novel targets for epigenetic silencing in breast cancer , 2009, British Journal of Cancer.

[66]  Leping Li,et al.  GADEM: A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an EM Algorithm for Motif Discovery , 2009, J. Comput. Biol..

[67]  Dustin E. Schones,et al.  Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. , 2008, Genome research.

[68]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[69]  Yu Liang,et al.  fdrMotif: identifying cis-elements by an EM algorithm coupled with false discovery rate control , 2008, Bioinform..

[70]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[71]  Michael Q. Zhang,et al.  Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome , 2007, Cell.

[72]  Zohar Yakhini,et al.  Discovering Motifs in Ranked Lists of DNA Sequences , 2007, PLoS Comput. Biol..

[73]  A. Philippakis,et al.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[74]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[75]  P. Neiman,et al.  A widely expressed transcription factor with multiple DNA sequence specificity, CTCF, is localized at chromosome segment 16q22.1 within one of the smallest regions of overlap for common deletions in breast and prostate cancers , 1998, Genes, chromosomes & cancer.

[76]  P. Neiman,et al.  An exceptionally conserved transcriptional repressor, CTCF, employs different combinations of zinc fingers to bind diverged promoter sequences of avian and mammalian c-myc oncogenes , 1996, Molecular and cellular biology.

[77]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[78]  K. L. Le Roch,et al.  Genome-Wide Analysis of RNA-Protein Interactions in Plasmodium falciparum Using eCLIP-Seq. , 2021, Methods in molecular biology.

[79]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[80]  Brian A. Kennedy,et al.  Using ChIPMotifs for de novo motif discovery of OCT4 and ZNF263 based on ChIP-based high-throughput experiments. , 2012, Methods in molecular biology.

[81]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.