A Multiple Test Correction for Streams and Cascades of Statistical Hypothesis Tests

Statistical hypothesis testing is a popular and powerful tool for inferring knowledge from data. For every such test performed, there is always a non-zero probability of making a false discovery, i.e.~rejecting a null hypothesis in error. Familywise error rate (FWER) is the probability of making at least one false discovery during an inference process. The expected FWER grows exponentially with the number of hypothesis tests that are performed, almost guaranteeing that an error will be committed if the number of tests is big enough and the risk is not managed; a problem known as the multiple testing problem. State-of-the-art methods for controlling FWER in multiple comparison settings require that the set of hypotheses be predetermined. This greatly hinders statistical testing for many modern applications of statistical inference, such as model selection, because neither the set of hypotheses that will be tested, nor even the number of hypotheses, can be known in advance. This paper introduces Subfamilywise Multiple Testing, a multiple-testing correction that can be used in applications for which there are repeated pools of null hypotheses from each of which a single null hypothesis is to be rejected and neither the specific hypotheses nor their number are known until the final rejection decision is completed. To demonstrate the importance and relevance of this work to current machine learning problems, we further refine the theory to the problem of model selection and show how to use Subfamilywise Multiple Testing for learning graphical models. We assess its ability to discover graphical models on more than 7,000 datasets, studying the ability of Subfamilywise Multiple Testing to outperform the state of the art on data with varying size and dimensionality, as well as with varying density and power of the present correlations. Subfamilywise Multiple Testing provides a significant improvement in statistical efficiency, often requiring only half as much data to discover the same model, while strictly controlling FWER.

[1]  Jonathan E. Dickerson,et al.  Supplementary webappendix , 2018 .

[2]  P. Bühlmann,et al.  Decomposition and Model Selection for Large Contingency Tables , 2009, Biometrical journal. Biometrische Zeitschrift.

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  Kenneth E Leonard,et al.  Spousal influence on smoking behaviors in a US community sample of newly married couples. , 2005, Social science & medicine.

[5]  O. J. Dunn Estimation of the Medians for Dependent Variables , 1959 .

[6]  Mark W. Schmidt,et al.  Learning Graphical Model Structure Using L1-Regularization Paths , 2007, AAAI.

[7]  Geoffrey I. Webb Discovering Significant Patterns , 2007, Machine Learning.

[8]  A M Lesk,et al.  Phylogeny of the serpin superfamily: implications of patterns of amino acid conservation for structure and function. , 2000, Genome research.

[9]  Geoffrey I. Webb,et al.  Scaling Log-Linear Analysis to High-Dimensional Data , 2013, 2013 IEEE 13th International Conference on Data Mining.

[10]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[11]  P. van der Putten,et al.  A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000 , 2004 .

[12]  Robert B. Wallace,et al.  Established Populations for Epidemiologic Studies of the Elderly, 1981-1993: [East Boston, Massachusetts, Iowa and Washington Counties, Iowa, New Haven, Connecticut, and North Central North Carolina] , 1993 .

[13]  Geoffrey I. Webb,et al.  Scaling log-linear analysis to datasets with thousands of variables , 2015, SDM.

[14]  Geoffrey I. Webb Layered critical values: a powerful direct-adjustment approach to discovering significant patterns , 2008, Machine Learning.

[15]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[16]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[17]  V. Johnson Revised standards for statistical evidence , 2013, Proceedings of the National Academy of Sciences.

[18]  Xintao Wu,et al.  Screening and interpreting multi-item associations based on log-linear modeling , 2003, KDD '03.

[19]  Luis M. de Campos,et al.  A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests , 2006, J. Mach. Learn. Res..

[20]  Martin J. Wainwright,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[21]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[22]  Robert M. Haralick,et al.  Approximating high dimensional probability distributions , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[23]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[24]  F. Wolinsky,et al.  Retirement and weight changes among men and women in the health and retirement study. , 2008, The journals of gerontology. Series B, Psychological sciences and social sciences.

[25]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[26]  Maarten van Someren,et al.  A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000 , 2004, Machine Learning.

[27]  W. Doherty,et al.  Smoke gets in your eyes: Cigarette smoking and divorce in a national sample of American adults. , 1998 .

[28]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[29]  R. Green,et al.  Explaining, not just predicting, drives interest in personal genomics , 2015, Genome Medicine.

[30]  Ron Kohavi,et al.  Online Controlled Experiments and A/B Testing , 2017, Encyclopedia of Machine Learning and Data Mining.

[31]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[32]  K. Tsuda,et al.  Statistical significance of combinatorial regulations , 2013, Proceedings of the National Academy of Sciences.

[33]  Geoffrey I. Webb,et al.  A Statistically Efficient and Scalable Method for Log-Linear Analysis of High-Dimensional Data , 2014, 2014 IEEE International Conference on Data Mining.

[34]  Pedro M. Domingos,et al.  Learning Efficient Markov Networks , 2010, NIPS.