Multiple Testing in Nonparametric Hidden Markov Models: An Empirical Bayes Approach

Given a nonparametric Hidden Markov Model (HMM) with two states, the question of constructing efficient multiple testing procedures is considered, treating one of the states as an unknown null hypothesis. A procedure is introduced, based on nonparametric empirical Bayes ideas, that controls the False Discovery Rate (FDR) at a user–specified level. Guarantees on power are also provided, in the form of a control of the true positive rate. One of the key steps in the construction requires supremum–norm convergence of preliminary estimators of the emission densities of the HMM. We provide the existence of such estimators, with convergence at the optimal minimax rate, for the case of a HMM with J ≥ 2 states, which is of independent interest.

[1]  R. Durrett Probability: Theory and Examples , 1993 .

[2]  Xia Wang,et al.  Hidden Markov model in multiple testing on dependent count data , 2020 .

[3]  Stéphane Robin,et al.  Inference in finite state space non parametric Hidden Markov Models and applications , 2016, Stat. Comput..

[4]  A. Farcomeni Some Results on the Control of the False Discovery Rate under Dependence , 2007 .

[5]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[6]  Wenguang Sun,et al.  CARS: Covariate Assisted Ranking and Screening for Large-Scale Two-Sample Inference , 2018 .

[7]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[8]  Saharon Rosset,et al.  Optimal control of false discovery criteria in the two‐group model , 2019, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[9]  Luc Lehéricy Nonasymptotic control of the MLE for misspecified nonparametric hidden Markov models , 2021, Electronic Journal of Statistics.

[10]  Étienne Roquain,et al.  On using empirical null distributions in Benjamini-Hochberg procedure. , 2019, 1912.03109.

[11]  T. Dickhaus,et al.  Dependency and false discovery rate: Asymptotics , 2007, 0710.3171.

[12]  W. Wu,et al.  On false discovery control under dependence , 2008, 0803.1971.

[13]  R. Douc,et al.  Asymptotics of the maximum likelihood estimator for general hidden Markov models , 2001 .

[14]  B. Efron Size, power and false discovery rates , 2007, 0710.2245.

[15]  D. Paulin Concentration inequalities for Markov chains by Marton couplings and spectral methods , 2012, 1212.2015.

[16]  Luc Lehéricy State-by-state Minimax Adaptive Estimation for Nonparametric Hidden Markov Models , 2018, J. Mach. Learn. Res..

[17]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[18]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[19]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[20]  R. Nickl,et al.  Mathematical Foundations of Infinite-Dimensional Statistical Models , 2015 .

[21]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[22]  Xia Wang,et al.  Bayesian hidden Markov models for dependent large-scale multiple testing , 2019, Comput. Stat. Data Anal..

[23]  Étienne Roquain,et al.  On spike and slab empirical Bayes multiple testing , 2018, The Annals of Statistics.

[24]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[25]  P. Bickel,et al.  Asymptotic normality of the maximum-likelihood estimator for general hidden Markov models , 1998 .

[26]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[27]  H. Holzmann,et al.  Nonparametric identification of hidden Markov models , 2014 .

[28]  Wenguang Sun,et al.  Large‐scale multiple testing under dependence , 2009 .

[29]  Hongzhe Li,et al.  Optimal False Discovery Rate Control for Dependent Data. , 2011, Statistics and its interface.

[30]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[31]  I. Kolossváry,et al.  On the Absolute Continuity of the Blackwell Measure , 2015 .

[32]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[33]  Yohann De Castro,et al.  Consistent Estimation of the Filtering and Marginal Smoothing Distributions in Nonparametric Hidden Markov Models , 2015, IEEE Transactions on Information Theory.

[34]  C. Robert,et al.  Optimal Sample Size for Multiple Testing : the Case of Gene Expression Mi roarraysPeter , 2004 .

[35]  P. Petrushev,et al.  Kernel and wavelet density estimators on manifolds and more general metric spaces , 2018, Bernoulli.

[36]  T Petrie,et al.  Probabilistic functions of finite-state markov chains. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[38]  Étienne Roquain,et al.  Graph inference with clustering and false discovery rate control , 2019, 1907.10176.

[39]  Yohann de Castro,et al.  Minimax Adaptive Estimation of Nonparametric Hidden Markov Models , 2015, J. Mach. Learn. Res..

[40]  C. Yau,et al.  Bayesian non‐parametric hidden Markov models with applications in genomics , 2011 .