Universal and Composite Hypothesis Testing via Mismatched Divergence

For the universal hypothesis testing problem, where the goal is to decide between the known null hypothesis distribution and some other unknown distribution, Hoeffding proposed a universal test in the nineteen sixties. Hoeffding's universal test statistic can be written in terms of Kullback-Leibler (K-L) divergence between the empirical distribution of the observations and the null hypothesis distribution. In this paper a modification of Hoeffding's test is considered based on a relaxation of the K-L divergence, referred to as the mismatched divergence. The resulting mismatched test is shown to be a generalized likelihood-ratio test (GLRT) for the case where the alternate distribution lies in a parametric family of distributions characterized by a finite-dimensional parameter, i.e., it is a solution to the corresponding composite hypothesis testing problem. For certain choices of the alternate distribution, it is shown that both the Hoeffding test and the mismatched test have the same asymptotic performance in terms of error exponents. A consequence of this result is that the GLRT is optimal in differentiating a particular distribution from others in an exponential family. It is also shown that the mismatched test has a significant advantage over the Hoeffding test in terms of finite sample size performance for applications involving large alphabet distributions. This advantage is due to the difference in the asymptotic variances of the two test statistics under the null hypothesis.

[1]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[2]  Amos Lapidoth,et al.  Mismatched decoding and the multiple-access channel , 1994, IEEE Trans. Inf. Theory.

[3]  Sean P. Meyn,et al.  Anomaly detection using projective Markov models in a distributed sensor network , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[4]  Sean P. Meyn,et al.  Feature extraction for universal hypothesis testing via rank-constrained optimization , 2010, 2010 IEEE International Symposium on Information Theory.

[5]  M. Iltis,et al.  Sharp asymptotics of large deviations in ℝd , 1995 .

[6]  Andrew R. Barron,et al.  Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[7]  Sean P. Meyn,et al.  Asymptotic robust Neyman-Pearson hypothesis testing based on moment classes , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[8]  Sean P. Meyn,et al.  On thresholds for robust goodness-of-fit tests , 2010, 2010 IEEE Information Theory Workshop.

[9]  Sean P. Meyn,et al.  Statistical SVMs for robust detection, supervised learning, and universal classification , 2009, 2009 IEEE Information Theory Workshop on Networking and Information Theory.

[10]  R. Bass,et al.  Review: P. Billingsley, Convergence of probability measures , 1971 .

[11]  Jayakrishnan Unnikrishnan,et al.  Decision-making under statistical uncertainty , 2010 .

[12]  Michael C. Fu,et al.  Solving Continuous-State POMDPs via Density Projection , 2010, IEEE Transactions on Automatic Control.

[13]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[14]  Lizhong Zheng,et al.  Finding the best mismatched detector for channel coding and hypothesis testing , 2007, 2007 Information Theory and Applications Workshop.

[15]  Neri Merhav,et al.  When is the generalized likelihood ratio test optimal? , 1992, IEEE Trans. Inf. Theory.

[16]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[17]  Ofer Zeitouni,et al.  On universal hypotheses testing via large deviations , 1991, IEEE Trans. Inf. Theory.

[18]  N. Merhav,et al.  A competitive Neyman-Pearson approach to universal hypothesis testing with applications , 2001, Proceedings. 2001 IEEE International Symposium on Information Theory (IEEE Cat. No.01CH37252).

[19]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[20]  Sean P. Meyn,et al.  Worst-case large-deviation asymptotics with application to queueing and information theory , 2006 .

[21]  W. Hoeffding Asymptotically Optimal Tests for Multinomial Distributions , 1965 .

[22]  Ofer Zeitouni,et al.  Correction to 'On Universal Hypotheses Testing Via Large Deviations' , 1991, IEEE Trans. Inf. Theory.

[23]  Peter Harremoës,et al.  Testing Goodness-of-Fit via Rate Distortion , 2009, ITW.