An Integrated Software System for Analyzing Chip-chip and Chip-seq Data (supplementary Information)

We present CisGenome, a software system for analyzing genome-wide chromatin immunoprecipitation (ChIP) data. CisGenome is designed to meet all basic needs of ChIP data analyses, including visualization, data normalization, peak detection, false discovery rate computation, gene-peak association, and sequence and motif analysis. In addition to implementing previously published ChIP–microarray (ChIP-chip) analysis methods, the software contains statistical methods designed specifically for ChlP sequencing (ChIP-seq) data obtained by coupling ChIP with massively parallel sequencing. The modular design of CisGenome enables it to support interactive analyses through a graphic user interface as well as customized batch-mode computation for advanced data mining. A built-in browser allows visualization of array images, signals, gene structure, conservation, and DNA sequence and motif information. We demonstrate the use of these tools by a comparative analysis of ChIP-chip and ChIP-seq data for the transcription factor NRSF/REST, a study of ChIP-seq analysis with or without a negative control sample, and an analysis of a new motif in Nanog- and Sox2-binding regions.

[1]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[2]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[3]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[4]  Gail Mandel,et al.  REST: A mammalian silencer protein that restricts sodium channel gene expression to neurons , 1995, Cell.

[5]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[6]  A. Paquette,et al.  NRSF/REST is required in vivo for repression of multiple neuronal target genes during embryogenesis , 1998, Nature Genetics.

[7]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[8]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[9]  Ting Wang,et al.  Combining phylogenetic data with co-regulated genes to identify regulatory motifs , 2003, Bioinform..

[10]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[11]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[12]  W. Wong,et al.  CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Jun S. Liu,et al.  Decoding human regulatory circuits. , 2004, Genome research.

[14]  S. Cawley,et al.  Unbiased Mapping of Transcription Factor Binding Sites along Human Chromosomes 21 and 22 Points to Widespread Regulation of Noncoding RNAs , 2004, Cell.

[15]  Shane T. Jensen,et al.  Computational Discovery of Gene Regulatory Binding Motifs: A Bayesian Perspective , 2004 .

[16]  Serafim Batzoglou,et al.  Eukaryotic regulatory element conservation analysis and identification using comparative genomics. , 2004, Genome research.

[17]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[18]  S. Cawley,et al.  Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. , 2004, Genome research.

[19]  Mathieu Blanchette,et al.  PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences , 2004, BMC Bioinformatics.

[20]  Wing H Wong,et al.  Sampling motifs on phylogenetic trees. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Erik van Nimwegen,et al.  PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny , 2005, PLoS Comput. Biol..

[22]  Qing Zhou,et al.  A boosting approach for motif modeling using ChIP-chip data , 2005, Bioinform..

[23]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[24]  Megan F. Cole,et al.  Core Transcriptional Regulatory Circuitry in Human Embryonic Stem Cells , 2005, Cell.

[25]  Wing Hung Wong,et al.  TileMap: create chromosomal map of tiling array hybridizations , 2005, Bioinform..

[26]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[27]  Clifford A. Meyer,et al.  Genome-wide analysis of estrogen receptor binding sites , 2006, Nature Genetics.

[28]  W. Wong,et al.  Computational Biology: Toward Deciphering Gene Regulatory Information in Mammalian Genomes , 2006, Biometrics.

[29]  Kevin Struhl,et al.  Rank-statistics based enrichment-site prediction algorithm developed for chromatin immunoprecipitation on chip experiments , 2006, BMC Bioinformatics.

[30]  Mark Gerstein,et al.  Tilescope: online analysis pipeline for high-density tiling microarray data , 2007, Genome Biology.

[31]  X. Chen,et al.  The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells , 2006, Nature Genetics.

[32]  Mark Bieda,et al.  Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome. , 2006, Genome research.

[33]  Ernest Fraenkel,et al.  High-resolution computational models of genome binding events , 2006, Nature Biotechnology.

[34]  Clifford A. Meyer,et al.  Model-based analysis of tiling-arrays for ChIP-chip , 2006, Proceedings of the National Academy of Sciences.

[35]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[36]  Sean Davis,et al.  Statistics for ChIP-chip and DNase hypersensitivity experiments on NimbleGen arrays. , 2006, Methods in enzymology.

[37]  Mark Gerstein,et al.  Bioinformatics Original Paper a Supervised Hidden Markov Model Framework for Efficiently Segmenting Tiling Array Data in Transcriptional and Chip-chip Experiments: Systematically Incorporating Validated Biological Knowledge , 2022 .

[38]  Hongkai Ji,et al.  A comparative analysis of genome-wide chromatin immunoprecipitation data for mammalian transcription factors , 2006, Nucleic acids research.

[39]  John Quackenbush,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm043 Gene , 2022 .

[40]  Michael Q. Zhang,et al.  Using quality scores and longer reads improves accuracy of Solexa read mapping , 2008, BMC Bioinformatics.

[41]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[42]  T. Mikkelsen,et al.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells , 2007, Nature.

[43]  W. Wong,et al.  Coupling Hidden Markov Models for the Discovery of Cis-Regulatory Modules in Multiple Species , 2007, 0708.4318.

[44]  David J Young,et al.  High‐throughput mapping of origins of replication in human cells , 2007, EMBO reports.

[45]  Sündüz Keleş,et al.  Mixture Modeling for Genome‐Wide Localization of Transcription Factors , 2007, Biometrics.

[46]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[47]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[48]  Wei Li,et al.  Model-based analysis of two-color arrays (MA2C) , 2007, Genome Biology.

[49]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[50]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[51]  Leah Barrera,et al.  ChIP‐chip: Data, Model, and Analysis , 2007, Biometrics.

[52]  Yijun Ruan,et al.  Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. , 2007, Genome research.

[53]  Philipp Bucher,et al.  ChIP-Seq Data Reveal Nucleosome Architecture of Human Promoters , 2007, Cell.

[54]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[55]  Wing Hung Wong,et al.  SeqMap: mapping massive amount of oligonucleotides to the genome , 2008, Bioinform..

[56]  Marc T. Facciotti,et al.  Model-based deconvolution of genome-wide DNA binding , 2008, Bioinform..

[57]  Megan F. Cole,et al.  Connecting microRNA Genes to the Core Transcriptional Regulatory Circuitry of Embryonic Stem Cells , 2008, Cell.

[58]  Raja Jothi,et al.  Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[59]  Thomas Zeng,et al.  Global analysis of in vivo Foxa2-binding sites in mouse adult liver using massively parallel sequencing , 2008, Nucleic acids research.

[60]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[61]  Istvan Albert,et al.  GeneTrack - a genomic data processing and visualization framework , 2008, Bioinform..

[62]  Heejung Shim,et al.  Integrating quantitative information from ChIP-chip experiments into motif finding. , 2008, Biostatistics.

[63]  Mark Gerstein,et al.  Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. , 2008, Genome research.

[64]  Raphael Gottardo,et al.  A Flexible and Powerful Bayesian Hierarchical Model for ChIP–Chip Experiments , 2008, Biometrics.

[65]  Tao Liu,et al.  CEAS: cis-regulatory element annotation system , 2009, Bioinform..