Computational analysis and modeling of genome-scale avidity distribution of transcription factor binding sites in chip-pet experiments.

Advances in high-throughput technologies, such as ChIP-chip and ChIP-PET (Chromatin Immuno-Precipitation Paired-End diTag), and the availability of human and mouse genome sequences now allow us to identify transcription factor binding sites (TFBS) and analyze mechanisms of gene regulation on the level of the entire genome. Here, we have developed a computational approach which uses ChIP-PET data and statistical modeling to assess experimental noise and identify reliable TFBS for c-Myc, STAT1 and p53 transcription factors in the human genome. We propose a mixture probabilistic model and develop computational programs for Monte Carlo simulation of ChIP-PET data to define the background noise of the sequence clustering and to identify the probability function of specific DNA-protein binding in the eukaryotic genome. Our approach demonstrates high reproducibility of the method and not only distinguishes bona fide TFBSs from non-specific TFBSs with a high specificity, but also provides algorithmic and computational basis for further optimization of experimental parameters of the ChIP-PET method.

[1]  Mark Gerstein,et al.  Global changes in STAT target selection and transcription regulation upon interferon treatments. , 2005, Genes & development.

[2]  E. Liu,et al.  Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation , 2005, Nature Methods.

[3]  Wing-Kin Sung,et al.  PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data , 2006, BMC Bioinformatics.

[4]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[5]  Vladimir A. Kuznetsov,et al.  Family of skewed distributions associated with the gene expression and proteome evolution , 2003, Signal Process..

[6]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[7]  N. L. Johnson,et al.  Discrete Multivariate Distributions , 1998 .

[8]  Z. Weng,et al.  A Global Map of p53 Transcription-Factor Binding Sites in the Human Genome , 2006, Cell.

[9]  Kevin Struhl,et al.  Relationships between p63 binding, DNA sequence, transcription activity, and biological function in human cells. , 2006, Molecular cell.

[10]  Zhiping Weng,et al.  Global mapping of c-Myc binding sites and target gene networks in human B cells , 2006, Proceedings of the National Academy of Sciences.

[11]  X. Chen,et al.  The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells , 2006, Nature Genetics.

[12]  Andrea Cocito,et al.  Genomic targets of the human c-Myc protein. , 2003, Genes & development.

[13]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.