Development of an informatics analytics workflow for DAP-seq data exploration and validation for auxin response factors in maize

DNA affinity purification sequencing (DAP-seq) is a recently developed technique for transcription factor (TF) binding site discovery that produces datasets like ChIP-seq. A major advantage of the DAP-seq method is that it uses exogenously expressed TFs to directly interrogate genomic DNA, without the need for tagged transgenic lines or gene-specific antibodies while still capturing TF binding events in their genomic sequence context. To assess the accuracy of the DAP-seq, we utilized this method to generate genome wide binding profiles of maize AUXIN RESPONSE FACTORS (ARFs). ARFs are responsible for activating or repressing auxin response genes and play an important role in growth and developmental processes. This provides a typical scenario in which researchers would use DAP-seq to better understand how this important family of TFs regulates gene expression. The informatics analysis workflow consists of a selection of highly validated read aligners and transcription factor binding site prediction bioinformatics tools supported by in-house built custom python scripts. We investigate the accuracy of pattern mining underlying the ARF binding signatures with respect to the presence and position of conserved motifs. ARFs are known to bind as dimers to pairs of TGTC motifs as direct repeats, inverted repeats and everted repeats. Based on this knowledge, our workflow mines the DAP-seq datasets to find specific genomic regions with such signatures. After each round of mining, we validate the accuracy of results with known patterns and domain knowledge from our collaborators.