Kernel Robust Hypothesis Testing

The problem of robust hypothesis testing is studied, where under the null and the alternative hypotheses, the data-generating distributions are assumed to be in some uncertainty sets, and the goal is to design a test that performs well under the worst-case distributions over the uncertainty sets. In this paper, uncertainty sets are constructed in a data-driven manner using kernel method, i.e., they are centered around empirical distributions of training samples from the null and alternative hypotheses, respectively; and are constrained via the distance between kernel mean embeddings of distributions in the reproducing kernel Hilbert space, i.e., maximum mean discrepancy (MMD). The Bayesian setting and the Neyman-Pearson setting are investigated. For the Bayesian setting where the goal is to minimize the worst-case error probability, an optimal test is firstly obtained when the alphabet is finite. When the alphabet is infinite, a tractable approximation is proposed to quantify the worst-case average error probability, and a kernel smoothing method is further applied to design test that generalizes to unseen samples. A direct robust kernel test is also proposed and proved to be exponentially consistent. For the Neyman-Pearson setting, where the goal is to minimize the worst-case probability of miss detection subject to a constraint on the worst-case probability of false alarm, an efficient robust kernel test is proposed and is shown to be asymptotically optimal. Numerical results are provided to demonstrate the performance of the proposed robust tests.

[1]  V. Veeravalli,et al.  Robust Hypothesis Testing with Moment Constrained Uncertainty Sets , 2022, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Shaofeng Zou,et al.  Robust Hypothesis Testing with Kernel Uncertainty Sets , 2022, 2022 IEEE International Symposium on Information Theory (ISIT).

[3]  Jie Wang,et al.  A Data-Driven Approach to Robust Hypothesis Testing Using Sinkhorn Uncertainty Sets , 2022, 2022 IEEE International Symposium on Information Theory (ISIT).

[4]  Zhongchang Sun,et al.  A Data-Driven Approach to Robust Hypothesis Testing Using Kernel MMD Uncertainty Sets , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[5]  Yao Xie,et al.  Robust Hypothesis Testing with Wasserstein Uncertainty Sets , 2021, 2105.14348.

[6]  H. Vincent Poor,et al.  Minimax Robust Detection: Classic Results and Recent Advances , 2021, IEEE Transactions on Signal Processing.

[7]  Zhitang Chen,et al.  Asymptotically Optimal One- and Two-Sample Testing With Kernels , 2019, IEEE Transactions on Information Theory.

[8]  Pierre Moulin,et al.  Statistical Inference for Engineers and Data Scientists , 2018 .

[9]  Huan Xu,et al.  Robust Hypothesis Testing Using Wasserstein Uncertainty Sets , 2018, NeurIPS.

[10]  Alexander J. Smola,et al.  Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[11]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[12]  Bernhard Schölkopf,et al.  Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions , 2016, J. Mach. Learn. Res..

[13]  H. Vincent Poor,et al.  Nonparametric Detection of Geometric Structures Over Networks , 2016, IEEE Transactions on Signal Processing.

[14]  Zoubin Ghahramani,et al.  Statistical Model Criticism using Kernel Two Sample Tests , 2015, NIPS.

[15]  Le Song,et al.  M-Statistic for Kernel Change-Point Detection , 2015, NIPS.

[16]  Arthur Gretton,et al.  Fast Two-Sample Testing with Analytic Representations of Probability Measures , 2015, NIPS.

[17]  Abdelhak M. Zoubir,et al.  Minimax Robust Hypothesis Testing , 2015, IEEE Transactions on Information Theory.

[18]  Sashank J. Reddi,et al.  On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions , 2014, AAAI.

[19]  H. Vincent Poor,et al.  Nonparametric Detection of Anomalous Data Streams , 2014, IEEE Transactions on Signal Processing.

[20]  Bharath K. Sriperumbudur,et al.  Two-stage sampled learning theory on distributions , 2014, AISTATS.

[21]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[22]  Bharath K. Sriperumbudur On the optimal estimation of probability measures in weak and strong topologies , 2013, 1310.8240.

[23]  Wojciech Zaremba,et al.  B-test: A Non-parametric, Low Variance Kernel Two-sample Test , 2013, NIPS.

[24]  Mauro Barni,et al.  The Source Identification Game: An Information-Theoretic Perspective , 2013, IEEE Transactions on Information Forensics and Security.

[25]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[26]  Gary M. Weiss,et al.  The Impact of Personalization on Smartphone-Based Activity Recognition , 2012, AAAI 2012.

[27]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[28]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[29]  Gary M. Weiss,et al.  Design considerations for the WISDM smart phone-based sensor mining architecture , 2011, SensorKDD '11.

[30]  Gary M. Weiss,et al.  Activity recognition using cell phone accelerometers , 2011, SKDD.

[31]  Bernhard Schölkopf,et al.  Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions , 2009, NIPS.

[32]  Zaïd Harchaoui,et al.  A Fast, Consistent Kernel Two-Sample Test , 2009, NIPS.

[33]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[34]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[35]  Bernard C. Levy,et al.  Robust Hypothesis Testing With a Relative Entropy Tolerance , 2007, IEEE Transactions on Information Theory.

[36]  Alexander J. Smola,et al.  Unifying Divergence Minimization and Statistical Inference Via Convex Duality , 2006, COLT.

[37]  Sean P. Meyn,et al.  Asymptotic robust Neyman-Pearson hypothesis testing based on moment classes , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[38]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[39]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[40]  Robert Hafner,et al.  Construction of minimax-tests for bounded families of probability-densities , 1993 .

[41]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[42]  H. Vincent Poor,et al.  On the p-point uncertainty class , 1984, IEEE Trans. Inf. Theory.

[43]  T. Bednarski On solutions of minimax test problems for special capacities , 1981 .

[44]  Saleem A. Kassam,et al.  Robust hypothesis testing for bounded classes of probability densities , 1981, IEEE Trans. Inf. Theory.

[45]  F. Österreicher,et al.  On the construction of least favourable pairs of distributions , 1978 .

[46]  H. Rieder Least Favorable Pairs for Special Capacities , 1977 .

[47]  Edward C. Posner,et al.  Random coding strategies for minimum entropy , 1975, IEEE Trans. Inf. Theory.

[48]  D. Varberg,et al.  Another Proof that Convex Functions are Locally Lipschitz , 1974 .

[49]  P. J. Huber A Robust Version of the Probability Ratio Test , 1965 .

[50]  W. Hoeffding Asymptotically Optimal Tests for Multinomial Distributions , 1965 .

[51]  M. Sion On general minimax theorems , 1958 .

[52]  A. Tychonoff Über die topologische Erweiterung von Räumen , 1930 .

[53]  J. Jensen Sur les fonctions convexes et les inégalités entre les valeurs moyennes , 1906 .

[54]  Bernhard Schölkopf,et al.  Kernel Distributionally Robust Optimization: Generalized Duality Theorem and Stochastic Approximation , 2021, AISTATS.

[55]  Lifeng Lai,et al.  On the Adversarial Robustness of Hypothesis Testing , 2021, IEEE Transactions on Signal Processing.

[56]  Abubakr Gafar Abdalla,et al.  Probability Theory , 2017, Encyclopedia of GIS.

[57]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[58]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[59]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[60]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[61]  Robert Hafner,et al.  Simple construction of least favourable pairs of distributions and of robust tests for prokhorov-neighbourhoods , 1982 .

[62]  P. T. Johnstone,et al.  Tychonoff's theorem without the axiom of choice , 1981 .

[63]  Yu. V. Prokhorov Convergence of Random Processes and Limit Theorems in Probability Theory , 1956 .