metaseq: a Python package for integrative genome-wide analysis reveals relationships between chromatin insulators and associated nuclear mRNA

Here we introduce metaseq, a software library written in Python, which enables loading multiple genomic data formats into standard Python data structures and allows flexible, customized manipulation and visualization of data from high-throughput sequencing studies. We demonstrate its practical use by analyzing multiple datasets related to chromatin insulators, which are DNA-protein complexes proposed to organize the genome into distinct transcriptional domains. Recent studies in Drosophila and mammals have implicated RNA in the regulation of chromatin insulator activities. Moreover, the Drosophila RNA-binding protein Shep has been shown to antagonize gypsy insulator activity in a tissue-specific manner, but the precise role of RNA in this process remains unclear. Better understanding of chromatin insulator regulation requires integration of multiple datasets, including those from chromatin-binding, RNA-binding, and gene expression experiments. We use metaseq to integrate RIP- and ChIP-seq data for Shep and the core gypsy insulator protein Su(Hw) in two different cell types, along with publicly available ChIP-chip and RNA-seq data. Based on the metaseq-enabled analysis presented here, we propose a model where Shep associates with chromatin cotranscriptionally, then is recruited to insulator complexes in trans where it plays a negative role in insulator activity.

[1]  V. Beneš,et al.  Df31 protein and snoRNAs maintain accessible higher-order structures of chromatin. , 2012, Molecular cell.

[2]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[3]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[4]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[5]  Howard Y. Chang,et al.  Genome regulation by long noncoding RNAs. , 2012, Annual review of biochemistry.

[6]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[7]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[8]  E. R. Gavis,et al.  Dynein-Dependent Transport of nanos RNA in Drosophila Sensory Neurons Requires Rumpelstiltskin and the Germ Plasm Organizer Oskar , 2013, The Journal of Neuroscience.

[9]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[10]  Fidel Ramírez,et al.  deepTools: a flexible platform for exploring deep-sequencing data , 2014, Nucleic Acids Res..

[11]  Brent S. Pedersen,et al.  CruzDB: software for annotation of genomic intervals with UCSC genome-browser database , 2013, Bioinform..

[12]  Leah H. Matzat,et al.  Surviving an identity crisis: a revised view of chromatin insulators in the genomics era. , 2014, Biochimica et biophysica acta.

[13]  K. Kristiansen,et al.  Single base–resolution methylome of the silkworm reveals a sparse epigenomic map , 2010, Nature Biotechnology.

[14]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[15]  Leah H. Matzat,et al.  The RNA-binding protein Rumpelstiltskin antagonizes gypsy chromatin insulator function in a tissue-specific manner , 2014, Journal of Cell Science.

[16]  E. R. Gavis,et al.  A late phase of germ plasm accumulation during Drosophila oogenesis requires Lost and Rumpelstiltskin , 2011, Development.

[17]  Georges G. Grinstein,et al.  DNA visual and analytic data mining , 1997 .

[18]  Galt P. Barber,et al.  BigWig and BigBed: enabling browsing of large distributed datasets , 2010, Bioinform..

[19]  Leah H. Matzat,et al.  Messenger RNA is a functional component of a chromatin insulator complex , 2013, EMBO reports.

[20]  Gos Micklem,et al.  Supporting Online Material Materials and Methods Figs. S1 to S50 Tables S1 to S18 References Identification of Functional Elements and Regulatory Circuits by Drosophila Modencode , 2022 .

[21]  Eric Nestler,et al.  ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases , 2014, BMC Genomics.

[22]  Brent S. Pedersen,et al.  Pybedtools: a flexible Python library for manipulating genomic datasets and annotations , 2011, Bioinform..

[23]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[24]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[25]  Michael P Snyder,et al.  High-throughput sequencing for biology and medicine , 2013, Molecular systems biology.

[26]  Leah H. Matzat,et al.  Tissue-Specific Regulation of Chromatin Insulator Function , 2012, PLoS genetics.

[27]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[28]  V. Corces,et al.  RNA interference machinery influences the nuclear organization of a chromatin insulator , 2006, Nature Genetics.

[29]  Victor G Corces,et al.  Three subclasses of a Drosophila insulator show distinct and cell type-specific genomic distributions. , 2009, Genes & development.

[30]  David Haussler,et al.  The UCSC genome browser and associated tools , 2012, Briefings Bioinform..

[31]  Michael Y Tolstorukov,et al.  Nature and function of insulator protein binding sites in the Drosophila genome , 2012, Genome research.

[32]  William Stafford Noble,et al.  The Genomedata format for storing large-scale functional genomics data , 2010, Bioinform..

[33]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[34]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[35]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[36]  W. Sung,et al.  ChIA-PET tool for comprehensive chromatin interaction analysis with paired-end tag sequencing , 2010, Genome Biology.

[37]  Paul Theodor Pyl,et al.  HTSeq – A Python framework to work with high-throughput sequencing data , 2014 .