StoatyDive: Evaluation and classification of peak profiles for sequencing data

The prediction of binding sites (peak calling) is a common task in the data analysis of methods such as crosslinking or chromatin immunoprecipitation in combination with high-throughput sequencing (CLIP-Seq, ChIP-Seq). The predicted binding sites are often further analyzed to predict sequence motifs or structure patterns as an example. However, the obtained peak set can vary in their profile shapes because of the used peakcaller method, different binding domains of the protein, protocol biases, or other factors. Thus, a tool is missing that evaluates and classifies the predicted peaks based on their shapes. We hereby present StoatyDive, a tool that can be used to filter for specific peak profile shapes of sequencing data such as CLIP and ChIP. StoatyDive therefore fine tunes downstream analysis steps such as structure or sequence motif predictions and acts as a quality control. With StoatyDive we were able to classify distinct peak profile shapes from CLIP-seq data of the histone stem-loop-binding protein (SLBP). We show the potential of StoatyDive, as a quality control tool and as a filter to pick different shapes based on biological or methodical questions. StoatyDive is open source and freely available under GLP-3 at https://github.com/BackofenLab/StoatyDive and at bioconda https://anaconda.org/bioconda/stoatydive.

[1]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[2]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[3]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[4]  Annalisa Marsico,et al.  PureCLIP: capturing target-specific protein–RNA interaction footprints from single-nucleotide CLIP-seq data , 2017, Genome Biology.

[5]  Uwe Ohler,et al.  PARalyzer: definition of RNA binding sites from PAR-CLIP short-read sequence data , 2011, Genome Biology.

[6]  Marzia A. Cremona,et al.  Peak shape clustering reveals biological insights , 2015, BMC Bioinformatics.

[7]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[8]  E. Jankowsky,et al.  Specificity and nonspecificity in RNA–protein interactions , 2015, Nature Reviews Molecular Cell Biology.

[9]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[10]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[11]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[12]  E. Wagner,et al.  Knockdown of SLBP results in nuclear retention of histone mRNA. , 2009, RNA.

[13]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[14]  Gene W. Yeo,et al.  Robust transcriptome-wide discovery of RNA binding protein binding sites with enhanced CLIP (eCLIP) , 2016, Nature Methods.

[15]  Jernej Ule,et al.  Advances in CLIP Technologies for Studies of Protein-RNA Interactions. , 2018, Molecular cell.