Experimental design and data analysis of Ago‐RIP‐Seq experiments for the identification of microRNA targets

&NA; The identification of microRNA (miRNA) target genes is crucial for understanding miRNA function. Many methods for the genome‐wide miRNA target identification have been developed in recent years; however, they have several limitations including the dependence on low‐confident prediction programs and artificial miRNA manipulations. Ago‐RNA immunoprecipitation combined with high‐throughput sequencing (Ago‐RIP‐Seq) is a promising alternative. However, appropriate statistical data analysis algorithms taking into account the experimental design and the inherent noise of such experiments are largely lacking. Here, we investigate the experimental design for Ago‐RIP‐Seq and examine biostatistical methods to identify de novo miRNA target genes. Statistical approaches considered are either based on a negative binomial model fit to the read count data or applied to transformed data using a normal distribution‐based generalized linear model. We compare them by a real data simulation study using plasmode data sets and evaluate the suitability of the approaches to detect true miRNA targets by sensitivity and false discovery rates. Our results suggest that simple approaches like linear regression models on (appropriately) transformed read count data are preferable.

[1]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[2]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[3]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[4]  M. Robinson,et al.  Small-sample estimation of negative binomial dispersion, with applications to SAGE data. , 2007, Biostatistics.

[5]  D. Heckmann,et al.  Ago-RIP-Seq identifies Polycomb repressive complex I member CBX7 as a major target of miR-375 in prostate cancer progression , 2016, Oncotarget.

[6]  D. Tollervey,et al.  Mapping the Human miRNA Interactome by CLASH Reveals Frequent Noncanonical Binding , 2013, Cell.

[7]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[8]  Hui Zhou,et al.  starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-Seq data , 2013, Nucleic Acids Res..

[9]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[11]  Kai Blin,et al.  DoRiNA 2.0—upgrading the doRiNA database of RNA interactions in post-transcriptional regulation , 2014, Nucleic Acids Res..

[12]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[13]  Scott B. Dewell,et al.  Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP , 2010, Cell.

[14]  Shamit Soneji,et al.  Identification of the miRNA targetome in hippocampal neurons using RIP-seq , 2015, Scientific Reports.

[15]  Pablo D. Reeb,et al.  Evaluating statistical analysis models for RNA sequencing experiments , 2013, Front. Genet..

[16]  J. Faraway Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models , 2005 .

[17]  Holger Sültmann,et al.  Circulating miRNAs are correlated with tumor progression in prostate cancer , 2011, International journal of cancer.

[18]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[19]  V. Hovestadt,et al.  Genome-wide identification of translationally inhibited and degraded miR-155 targets using RNA-interacting protein-IP , 2013, RNA biology.

[20]  M. Zavolan,et al.  A quantitative analysis of CLIP methods for identifying binding sites of RNA-binding proteins , 2011, Nature Methods.

[21]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[22]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[23]  Giulio Pavesi,et al.  RIP-Seq data analysis to determine RNA-protein associations. , 2015, Methods in molecular biology.

[24]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[25]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[26]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[27]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[28]  Sumio Sugano,et al.  Screening for possible miRNA-mRNA associations in a colon cancer cell line. , 2014, Gene.

[29]  Oliver Hofmann,et al.  Sequencing of captive target transcripts identifies the network of regulated genes and functions of primate-specific miR-522. , 2014, Cell reports.

[30]  David B. Allison,et al.  Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates , 2008, PLoS genetics.