Re-implementation of an algorithm to integrate transcriptome and ChIP-seq data

Transcription factor binding to a gene regulatory region induces or represses its expression. Binding and expression target analysis (BETA) integrates the binding and gene expression data to predict this function. First, the regulatory potential of the factor is modeled based on the distance of its binding sites from the transcription start sites in a decay function. Then the differential expression statistics from an experiment where this factor was perturbed represent the binding effect. The rank product of the two values is employed to order in importance. This algorithm was originally implemented in Python. We reimplemented the algorithm in R to take advantage of existing data structures and other tools for downstream analyses. Here, we attempted to replicate the findings in the original BETA paper. We applied the new implementation to the same datasets using default and varying inputs and cutoffs. We successfully replicated the original results. Moreover, we showed that the method was appropriately influenced by varying the input and was robust to choices of cutoffs in statistical testing.

[1]  D. Min,et al.  Integrating binding and expression data to predict transcription factors combined function , 2020, BMC Genomics.

[2]  Hanfei Sun,et al.  Target analysis by integration of transcriptome and ChIP-seq data with BETA , 2013, Nature Protocols.

[3]  Juri Rappsilber,et al.  TET1 and hydroxymethylcytosine in transcription and DNA methylation fidelity , 2011, Nature.

[4]  Zhaohui S. Qin,et al.  On the detection and refinement of transcription factor binding sites using ChIP-Seq data , 2010, Nucleic acids research.

[5]  Srinivasan Parthasarathy,et al.  Predicting functionality of protein–DNA interactions by integrating diverse evidence , 2009, Bioinform..

[6]  K. Pienta,et al.  A hierarchical network of transcription factors governs androgen receptor-dependent prostate cancer growth. , 2007, Molecular cell.

[7]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[8]  Clifford A. Meyer,et al.  Genome-wide analysis of estrogen receptor binding sites , 2006, Nature Genetics.

[9]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Katy C. Kao,et al.  gNCA: a framework for determining transcription factor activity based on transcriptome: identifiability and numerical implementation. , 2005, Metabolic engineering.

[11]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D S Latchman,et al.  Transcription factors: bound to activate or repress. , 2001, Trends in biochemical sciences.