Second-Order Asymptotically Optimal Statistical Classification

Motivated by real-world machine learning applications, we analyze approximations to the non-asymptotic fundamental limits of statistical classification. In the binary version of this problem, given two training sequences generated according to two {\em unknown} distributions $P_1$ and $P_2$, one is tasked to classify a test sequence which is known to be generated according to either $P_1$ or $P_2$. This problem can be thought of as an analogue of the binary hypothesis testing problem but in the present setting, the generating distributions are unknown. Due to finite sample considerations, we consider the second-order asymptotics (or dispersion-type) tradeoff between type-I and type-II error probabilities for tests which ensure that (i) the type-I error probability for {\em all} pairs of distributions decays exponentially fast and (ii) the type-II error probability for a {\em particular} pair of distributions is non-vanishing. We generalize our results to classification of multiple hypotheses with the rejection option.

[1]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[2]  Vincent Yan Fu Tan,et al.  Asymptotic Estimates in Information Theory with Non-Vanishing Error Probabilities , 2014, Found. Trends Commun. Inf. Theory.

[3]  H. Vincent Poor,et al.  Channel Coding Rate in the Finite Blocklength Regime , 2010, IEEE Transactions on Information Theory.

[4]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[5]  Ilias Diakonikolas,et al.  Optimal Algorithms for Testing Closeness of Discrete Distributions , 2013, SODA.

[6]  Vincent Yan Fu Tan,et al.  Second-Order Coding Rates for Channels With State , 2014, IEEE Transactions on Information Theory.

[7]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[8]  Jayakrishnan Unnikrishnan,et al.  Weak Convergence Analysis of Asymptotically Optimal Hypothesis Tests , 2016, IEEE Transactions on Information Theory.

[9]  Michael Gutman,et al.  Asymptotically optimal classification for multiple tests with empirically observed statistics , 1989, IEEE Trans. Inf. Theory.

[10]  G. Crooks On Measures of Entropy and Information , 2015 .

[11]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[12]  A. C. Berry The accuracy of the Gaussian approximation to the sum of independent variates , 1941 .

[13]  Lang Tong,et al.  A Large-Deviation Analysis of the Maximum-Likelihood Learning of Markov Tree Structures , 2009, IEEE Transactions on Information Theory.

[14]  Neri Merhav,et al.  A Bayesian approach for classification of Markov sources , 1991, IEEE Trans. Inf. Theory.

[15]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[16]  Pramod Viswanath,et al.  Classification of Homogeneous Data With Large Alphabets , 2013, IEEE Transactions on Information Theory.

[17]  Jayakrishnan Unnikrishnan,et al.  Asymptotically Optimal Matching of Multiple Sequences to Source Distributions and Training Sequences , 2014, IEEE Transactions on Information Theory.

[18]  Alon Orlitsky,et al.  Sublinear algorithms for outlier detection and generalized closeness testing , 2014, 2014 IEEE International Symposium on Information Theory.

[19]  Ronitt Rubinfeld,et al.  Testing Closeness of Discrete Distributions , 2010, JACM.

[20]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[21]  Richard E. Blahut,et al.  Hypothesis testing and information theory , 1974, IEEE Trans. Inf. Theory.

[22]  Imre Csiszár,et al.  Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .

[23]  H. Vincent Poor,et al.  Channel coding: non-asymptotic fundamental limits , 2010 .

[24]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[25]  Jacob Ziv,et al.  On classification with empirically observed statistics and universal data compression , 1988, IEEE Trans. Inf. Theory.

[26]  Masahito Hayashi,et al.  Information Spectrum Approach to Second-Order Coding Rate in Channel Coding , 2008, IEEE Transactions on Information Theory.