Improved Discovery of Molecular Interactions in Genome-Scale Data with Adaptive Model-Based Normalization

Background High throughput molecular-interaction studies using immunoprecipitations (IP) or affinity purifications are powerful and widely used in biology research. One of many important applications of this method is to identify the set of RNAs that interact with a particular RNA-binding protein (RBP). Here, the unique statistical challenge presented is to delineate a specific set of RNAs that are enriched in one sample relative to another, typically a specific IP compared to a non-specific control to model background. The choice of normalization procedure critically impacts the number of RNAs that will be identified as interacting with an RBP at a given significance threshold – yet existing normalization methods make assumptions that are often fundamentally inaccurate when applied to IP enrichment data. Methods In this paper, we present a new normalization methodology that is specifically designed for identifying enriched RNA or DNA sequences in an IP. The normalization (called adaptive or AD normalization) uses a basic model of the IP experiment and is not a variant of mean, quantile, or other methodology previously proposed. The approach is evaluated statistically and tested with simulated and empirical data. Results and Conclusions The adaptive (AD) normalization method results in a greatly increased range in the number of enriched RNAs identified, fewer false positives, and overall better concordance with independent biological evidence, for the RBPs we analyzed, compared to median normalization. The approach is also applicable to the study of pairwise RNA, DNA and protein interactions such as the analysis of transcription factors via chromatin immunoprecipitation (ChIP) or any other experiments where samples from two conditions, one of which contains an enriched subset of the other, are studied.

[1]  M. Hentze,et al.  Molecular mechanisms of translational control , 2004, Nature Reviews Molecular Cell Biology.

[2]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Julia Salzman,et al.  Proteome-Wide Search Reveals Unexpected RNA-Binding Proteins in Saccharomyces cerevisiae , 2010, PloS one.

[4]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[5]  H. Hieronymus,et al.  Genome-wide analysis of RNA–protein interactions illustrates specificity of the mRNA export machinery , 2003, Nature Genetics.

[6]  Patricia Soteropoulos,et al.  Global Analysis of Pub1p Targets Reveals a Coordinate Control of Gene Expression through Modulation of Binding and Stability , 2005, Molecular and Cellular Biology.

[7]  P. Brown,et al.  Extensive Association of Functionally and Cytotopically Related mRNAs with Puf Family RNA-Binding Proteins in Yeast , 2004, PLoS biology.

[8]  Shyr Yu,et al.  Use of normalization methods for analysis of microarrays containing a high degree of gene effects , 2008, BMC Bioinformatics.

[9]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[10]  Daniel Herschlag,et al.  Genome-wide identification of mRNAs associated with the translational regulator PUMILIO in Drosophila melanogaster. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[12]  J. Keene RNA regulons: coordination of post-transcriptional events , 2007, Nature Reviews Genetics.

[13]  A. Wald,et al.  On Stochastic Limit and Order Relationships , 1943 .

[14]  Daniel Herschlag,et al.  Diverse RNA-Binding Proteins Interact with Functionally Related Sets of RNAs, Suggesting an Extensive Regulatory System , 2008, PLoS biology.

[15]  W. Kamps,et al.  Evidence Based Selection of Housekeeping Genes , 2007, PloS one.

[16]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[17]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[18]  John Quackenbush,et al.  Data-driven normalization strategies for high-throughput quantitative RT-PCR , 2009, BMC Bioinformatics.

[19]  C. Guthrie,et al.  Functional specificity of shuttling hnRNPs revealed by genome-wide analysis of their RNA binding profiles. , 2005, RNA.