Galaxy CLIP-Explorer: a web server for CLIP-Seq data analysis

Abstract Background Post-transcriptional regulation via RNA-binding proteins plays a fundamental role in every organism, but the regulatory mechanisms lack important understanding. Nevertheless, they can be elucidated by cross-linking immunoprecipitation in combination with high-throughput sequencing (CLIP-Seq). CLIP-Seq answers questions about the functional role of an RNA-binding protein and its targets by determining binding sites on a nucleotide level and associated sequence and structural binding patterns. In recent years the amount of CLIP-Seq data skyrocketed, urging the need for an automatic data analysis that can deal with different experimental set-ups. However, noncanonical data, new protocols, and a huge variety of tools, especially for peak calling, made it difficult to define a standard. Findings CLIP-Explorer is a flexible and reproducible data analysis pipeline for iCLIP data that supports for the first time eCLIP, FLASH, and uvCLAP data. Individual steps like peak calling can be changed to adapt to different experimental settings. We validate CLIP-Explorer on eCLIP data, finding similar or nearly identical motifs for various proteins in comparison with other databases. In addition, we detect new sequence motifs for PTBP1 and U2AF2. Finally, we optimize the peak calling with 3 different peak callers on RBFOX2 data, discuss the difficulty of the peak-calling step, and give advice for different experimental set-ups. Conclusion CLIP-Explorer finally fills the demand for a flexible CLIP-Seq data analysis pipeline that is applicable to the up-to-date CLIP protocols. The article further shows the limitations of current peak-calling algorithms and the importance of a robust peak detection.

[1]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[2]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[3]  Gene W. Yeo,et al.  LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance. , 2012, Molecular cell.

[4]  Elodie Ey,et al.  Genetic and Functional Analyses of SHANK2 Mutations Suggest a Multiple Hit Model of Autism Spectrum Disorders , 2012, PLoS genetics.

[5]  Charles Girardot,et al.  Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers , 2016, BMC Bioinformatics.

[6]  Jernej Ule,et al.  Rbfox2-coordinated alternative splicing of Mef2d and Rock2 controls myoblast fusion during myogenesis. , 2014, Molecular cell.

[7]  Uwe Ohler,et al.  RCAS: an RNA centric annotation system for transcriptome-wide regions of interest , 2017, Nucleic acids research.

[8]  Gene W. Yeo,et al.  Advances and challenges in the detection of transcriptome‐wide protein–RNA interactions , 2017, Wiley interdisciplinary reviews. RNA.

[9]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[10]  Sung Wook Chi,et al.  CLIPick: a sensitive peak caller for expression-based deconvolution of HITS-CLIP signals , 2018, Nucleic acids research.

[11]  Manolis Maragkakis,et al.  CLIPSeqTools—a novel bioinformatics CLIP-seq analysis suite , 2016, RNA.

[12]  Richard Bonneau,et al.  The mRNA-bound proteome and its global occupancy profile on protein-coding transcripts. , 2012, Molecular cell.

[13]  Peter F. Stadler,et al.  Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures , 2009, PLoS Comput. Biol..

[14]  Matthias W. Hentze,et al.  A brave new world of RNA-binding proteins , 2018, Nature Reviews Molecular Cell Biology.

[15]  Alfredo Castello,et al.  The expanding universe of ribonucleoproteins: of novel RNA-binding proteins and unconventional interactions , 2016, Pflügers Archiv - European Journal of Physiology.

[16]  J. Ule,et al.  iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution , 2010, Nature Structural &Molecular Biology.

[17]  Gene W. Yeo,et al.  Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges , 2013, Nature Structural &Molecular Biology.

[18]  J. Harrow,et al.  Systematic evaluation of spliced alignment programs for RNA-seq data , 2013, Nature Methods.

[19]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[20]  S. Holban,et al.  A review of ensemble methods for de novo motif discovery in ChIP-Seq data , 2015, Briefings Bioinform..

[21]  Scott B. Dewell,et al.  Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP , 2010, Cell.

[22]  M. Ares,et al.  The splicing regulator Rbfox2 is required for both cerebellar development and mature motor function. , 2012, Genes & development.

[23]  Gene W. Yeo,et al.  Robust transcriptome-wide discovery of RNA binding protein binding sites with enhanced CLIP (eCLIP) , 2016, Nature Methods.

[24]  Rolf Backofen,et al.  uvCLAP is a fast and non-radioactive method to identify in vivo targets of RNA-binding proteins , 2018, Nature Communications.

[25]  Norman E. Davey,et al.  Insights into RNA Biology from an Atlas of Mammalian mRNA-Binding Proteins , 2012, Cell.

[26]  Charles C. Kim,et al.  Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq , 2016, BMC Bioinformatics.

[27]  Julian König,et al.  Analysis of CLIP and iCLIP methods for nucleotide-resolution studies of protein-RNA interactions , 2012, Genome Biology.

[28]  Rolf Backofen,et al.  Computational analysis of CLIP-seq data. , 2017, Methods.

[29]  S. Gerstberger,et al.  A census of human RNA-binding proteins , 2014, Nature Reviews Genetics.

[30]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[31]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[32]  A. Heger,et al.  UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[33]  Nejc Haberman,et al.  Data Science Issues in Studying Protein-RNA Interactions with CLIP Technologies. , 2018, Annual review of biomedical data science.

[34]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[35]  Thomas Bourgeron,et al.  The emerging role of SHANK genes in neuropsychiatric disorders , 2014, Developmental neurobiology.

[36]  Jernej Ule,et al.  Advances in CLIP Technologies for Studies of Protein-RNA Interactions. , 2018, Molecular cell.

[37]  Fátima Sánchez-Cabo,et al.  ATtRACT—a database of RNA-binding proteins and associated motifs , 2016, Database J. Biol. Databases Curation.

[38]  Uwe Ohler,et al.  omniCLIP: probabilistic identification of protein-RNA interactions from CLIP-seq data , 2018, Genome Biology.

[39]  Fidel Ramírez,et al.  deepTools2: a next generation web server for deep-sequencing data analysis , 2016, Nucleic Acids Res..

[40]  Christopher R. Sibley,et al.  iCLIP: Protein–RNA interactions at nucleotide resolution , 2014, Methods.

[41]  Thomas Tuschl,et al.  Structure-function studies of STAR family Quaking proteins bound to their in vivo RNA target sites. , 2013, Genes & development.

[42]  Andrew D. Smith,et al.  Site identification in high-throughput RNA-protein interaction data , 2012, Bioinform..

[43]  Nejc Haberman,et al.  Data Science Issues in Understanding Protein-RNA Interactions , 2017, bioRxiv.

[44]  Raquel Almeida,et al.  RNA-Binding Proteins in Cancer: Old Players and New Actors. , 2017, Trends in cancer.

[45]  Måns Magnusson,et al.  MultiQC: summarize analysis results for multiple tools and samples in a single report , 2016, Bioinform..

[46]  Walid Al-Atabany,et al.  Review of Different Sequence Motif Finding Algorithms , 2019, Avicenna journal of medical biotechnology.

[47]  Bogdan Tanasa,et al.  From benchmarking HITS-CLIP peak detection programs to a new method for identification of miRNA-binding sites from Ago2-CLIP data , 2017, Nucleic acids research.

[48]  Silvia Bottini,et al.  Recent computational developments on CLIP-seq data analysis and microRNA targeting implications , 2017, Briefings Bioinform..

[49]  Anton J. Enright,et al.  Kraken: A set of tools for quality control and analysis of high-throughput sequence data , 2013, Methods.

[50]  Eun Ji Kim,et al.  Simulation-based comprehensive benchmarking of RNA-seq aligners , 2016, Nature Methods.

[51]  Yang Xie,et al.  PIPE-CLIP: a comprehensive online tool for CLIP-seq data analysis , 2014, Genome Biology.

[52]  Gene W. Yeo,et al.  RNA-binding proteins in neurodegeneration: Seq and you shall receive , 2015, Trends in Neurosciences.

[53]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[54]  E. Jankowsky,et al.  Specificity and nonspecificity in RNA–protein interactions , 2015, Nature Reviews Molecular Cell Biology.

[55]  Andreas Heger,et al.  UMI-tools: Modelling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[56]  Rolf Backofen,et al.  StoatyDive: Evaluation and classification of peak profiles for sequencing data , 2019, bioRxiv.

[57]  R. Backofen,et al.  GraphProt: modeling binding preferences of RNA-binding proteins , 2014, Genome Biology.

[58]  Gene W. Yeo,et al.  An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells , 2009, Nature Structural &Molecular Biology.

[59]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[60]  Annalisa Marsico,et al.  PureCLIP: capturing target-specific protein–RNA interaction footprints from single-nucleotide CLIP-seq data , 2017, Genome Biology.

[61]  Uwe Ohler,et al.  PARalyzer: definition of RNA binding sites from PAR-CLIP short-read sequence data , 2011, Genome Biology.