Assessing replicability of findings across two studies of multiple features

SummaryReplicability analysis aims to identify the overlapping signals across independent studies that examine the same features. For this purpose we develop hypothesis testing procedures that first select the promising features from each of two studies separately. Only those features selected in both studies are then tested. The proposed procedures have theoretical guarantees regarding their control of the familywise error rate or false discovery rate on the replicability claims. They can also be used for signal discovery in each study separately, with the desired error control. Their power for detecting truly replicable findings is compared to alternatives. We illustrate the procedures on behavioural genetics data.

[1]  S. P. Wright,et al.  Adjusted P-values for simultaneous inference , 1992 .

[2]  Hongzhe Li,et al.  Optimal detection of weak positive dependence between two mixture distributions , 2014, 1412.2149.

[3]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[4]  S. Horvath,et al.  Gene Expression Profiling of Gliomas Strongly Predicts Survival , 2004, Cancer Research.

[5]  Ruth Heller,et al.  Replicability analysis for genome-wide association studies , 2012, 1209.2829.

[6]  C. Carlson,et al.  Generalization and Dilution of Association Results from European GWAS in Populations of Non-European Ancestry: The PAGE Study , 2013, PLoS biology.

[7]  J. Ioannidis,et al.  Consistency of genome-wide associations across major ancestral groups , 2011, Human Genetics.

[8]  Yingying Wei,et al.  Joint analysis of differential gene expression in multiple studies using correlation motifs , 2013, Biostatistics.

[9]  Sihai Dave Zhao,et al.  Nonparametric false discovery rate control for identifying simultaneous signals , 2015, Electronic Journal of Statistics.

[10]  M. McCarthy,et al.  Improved detection of common variants associated with schizophrenia by leveraging pleiotropy with cardiovascular-disease risk factors. , 2013, American journal of human genetics.

[11]  Yoav Benjamini,et al.  Deciding whether follow-up studies have replicated findings in a preliminary large-scale omics study , 2013, Proceedings of the National Academy of Sciences.

[12]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[13]  Gilles Blanchard,et al.  Adaptive False Discovery Rate Control under Independence and Dependence , 2009, J. Mach. Learn. Res..

[14]  J. Crabbe,et al.  Genetics of mouse behavior: interactions with laboratory environment. , 1999, Science.

[15]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[16]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[17]  Chadi Touma,et al.  Effect of Population Heterogenization on the Reproducibility of Mouse Behavior: A Multi-Laboratory Study , 2011, PloS one.

[18]  E. Spjøtvoll,et al.  Plots of P-values to evaluate many tests simultaneously , 1982 .

[19]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[20]  Ruth Heller,et al.  Repfdr: a Tool for Replicability Analysis for Genome-wide Association Studies , 2014, Bioinform..

[21]  Yoav Benjamini,et al.  Selective inference in complex research , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[22]  H. Finner,et al.  Controlling the familywise error rate with plug‐in estimator for the proportion of true null hypotheses , 2009 .

[23]  Arcadi Navarro,et al.  Statistical Applications in Genetics and Molecular Biology How to analyze many contingency tables simultaneously in genetic association studies , 2012 .

[24]  Anat Sakov,et al.  Genotype-environment interactions in mouse behavior: a way out of the problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Wenguang Sun,et al.  Multiple Testing for Pattern Identification, With Applications to Microarray Time-Course Experiments , 2011 .

[26]  Marina Bogomolov,et al.  Discovering Findings That Replicate From a Primary Study of High Dimension to a Follow-Up Study , 2012, 1207.0187.

[27]  D. Rujescu,et al.  Improved Detection of Common Variants Associated with Schizophrenia and Bipolar Disorder Using Pleiotropy-Informed Conditional False Discovery Rate , 2013, PLoS genetics.

[28]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[29]  F. Agakov,et al.  Abundant pleiotropy in human complex diseases and traits. , 2011, American journal of human genetics.

[30]  Y. Benjamini,et al.  Adaptive linear step-up procedures that control the false discovery rate , 2006 .

[31]  Thomas D. Wu,et al.  Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. , 2006, Cancer cell.

[32]  L. Wasserman,et al.  False discovery control with p-value weighting , 2006 .

[33]  Yoav Benjamini,et al.  Selective inference on multiple families of hypotheses , 2014 .