Second-Order Asymptotically Optimal Statistical Classification

Motivated by real-world machine learning applications, we analyze approximations to the non-asymptotic fundamental limits of statistical classification. In the binary version of this problem, given two training sequences generated according to two unknown distributions P1 and P2, one is tasked to classify a test sequence which is known to be generated according to either P1 or P2. This problem can be thought of as an analogue of the binary hypothesis testing problem but in the present setting, the generating distributions are unknown. Due to finite sample considerations, we consider the second-order asymptotics (or dispersion-type) tradeoff between type-I and type-II error probabilities for tests which ensure that (i) the type-I error probability for all pairs of distributions decays exponentially fast and (ii) the type-II error probability for a particular pair of distributions is non-vanishing. We generalize our results to classification of multiple hypotheses with the rejection option.

[1]  Jayakrishnan Unnikrishnan,et al.  Asymptotically Optimal Matching of Multiple Sequences to Source Distributions and Training Sequences , 2014, IEEE Transactions on Information Theory.

[2]  Neri Merhav,et al.  A Bayesian approach for classification of Markov sources , 1991, IEEE Trans. Inf. Theory.

[3]  Jayakrishnan Unnikrishnan,et al.  Weak Convergence Analysis of Asymptotically Optimal Hypothesis Tests , 2016, IEEE Transactions on Information Theory.

[4]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[5]  Richard E. Blahut,et al.  Hypothesis testing and information theory , 1974, IEEE Trans. Inf. Theory.

[6]  Jacob Ziv,et al.  On classification with empirically observed statistics and universal data compression , 1988, IEEE Trans. Inf. Theory.

[7]  Masahito Hayashi,et al.  Information Spectrum Approach to Second-Order Coding Rate in Channel Coding , 2008, IEEE Transactions on Information Theory.

[8]  Michael Gutman,et al.  Asymptotically optimal classification for multiple tests with empirically observed statistics , 1989, IEEE Trans. Inf. Theory.

[9]  H. Vincent Poor,et al.  Channel Coding Rate in the Finite Blocklength Regime , 2010, IEEE Transactions on Information Theory.

[10]  A. Rényi On Measures of Entropy and Information , 1961 .

[11]  Mehul Motani,et al.  Second-Order Asymptotically Optimal Statistical Classification , 2018, 2019 IEEE International Symposium on Information Theory (ISIT).

[12]  Pramod Viswanath,et al.  Classification of Homogeneous Data With Large Alphabets , 2013, IEEE Transactions on Information Theory.

[13]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[14]  Vincent Yan Fu Tan,et al.  Asymptotic Estimates in Information Theory with Non-Vanishing Error Probabilities , 2014, Found. Trends Commun. Inf. Theory.

[15]  H. Vincent Poor,et al.  Channel coding: non-asymptotic fundamental limits , 2010 .