Inference Under Information Constraints I: Lower Bounds From Chi-Square Contraction

Multiple players are each given one independent sample, about which they can only provide limited information to a central referee. Each player is allowed to describe its observed sample to the referee using a channel from a family of channels $\mathcal {W}$ , which can be instantiated to capture, among others, both the communication- and privacy-constrained settings. The referee uses the players’ messages to solve an inference problem on the unknown distribution that generated the samples. We derive lower bounds for the sample complexity of learning and testing discrete distributions in this information-constrained setting. Underlying our bounds is a characterization of the contraction in chi-square distance between the observed distributions of the samples when information constraints are placed. This contraction is captured in a local neighborhood in terms of chi-square and decoupled chi-square fluctuations of a given channel, two quantities we introduce. The former captures the average distance between distributions of channel output for two product distributions on the input, and the latter for a product distribution and a mixture of product distribution on the input. Our bounds are tight for both public- and private-coin protocols. Interestingly, the sample complexity of testing is order-wise higher when restricted to private-coin protocols.

[1]  H. Hotelling The consistency and ultimate distribution of optimum statistics , 1930 .

[2]  A. Wald,et al.  On the Choice of the Number of Class Intervals in the Application of the Chi Square Test , 1942 .

[3]  S. Szarek On the best constants in the Khinchin inequality , 1976 .

[4]  Yu. I. Medvedev Separable Statistics in a Polynomial Scheme. I , 1977 .

[5]  Rudolf Ahlswede,et al.  Hypothesis testing with communication constraints , 1986, IEEE Trans. Inf. Theory.

[6]  Te Han,et al.  Hypothesis testing with multiterminal data compression , 1987, IEEE Trans. Inf. Theory.

[7]  A. Barron Uniformly Powerful Goodness of Fit Tests , 1989 .

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  J. Tsitsiklis Decentralized Detection' , 1993 .

[10]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[11]  Rick S. Blum,et al.  Distributed detection with multiple sensors I. Advanced topics , 1997, Proc. IEEE.

[12]  Pramod K. Varshney,et al.  Distributed detection with multiple sensors I. Fundamentals , 1997, Proc. IEEE.

[13]  Shun-ichi Amari,et al.  Statistical Inference Under Multiterminal Data Compression , 1998, IEEE Trans. Inf. Theory.

[14]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[15]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[16]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[17]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[18]  Eran Omri,et al.  Distributed Private Data Analysis: On Simultaneously Solving How and What , 2008, CRYPTO.

[19]  Imre Csiszár,et al.  Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .

[20]  Ronitt Rubinfeld Taming big probability distributions , 2012, XRDS.

[21]  Holger Rauhut,et al.  A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.

[22]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[23]  Martin J. Wainwright,et al.  Information-theoretic lower bounds for distributed statistical estimation with communication constraints , 2013, NIPS.

[24]  Martin J. Wainwright,et al.  Distance-based and continuum Fano inequalities with applications to statistical estimation , 2013, ArXiv.

[25]  Yu Xiang,et al.  Interactive hypothesis testing against independence , 2013, 2013 IEEE International Symposium on Information Theory.

[26]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[27]  Gregory Valiant,et al.  An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[28]  Tengyu Ma,et al.  On Communication Cost of Distributed Statistical Estimation and Dimensionality , 2014, NIPS.

[29]  Ohad Shamir,et al.  Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation , 2013, NIPS.

[30]  慧 廣瀬 A Mathematical Introduction to Compressive Sensing , 2015 .

[31]  Gregory Valiant,et al.  Memory, Communication, and Statistical Queries , 2016, COLT.

[32]  Clément L. Canonne,et al.  A Survey on Distribution Testing: Your Data is Big. But is it Blue? , 2020, Electron. Colloquium Comput. Complex..

[33]  Peter Kairouz,et al.  Discrete Distribution Estimation under Local Privacy , 2016, ICML.

[34]  M. Wigger,et al.  Testing against independence with multiple decision centers , 2016, 2016 International Conference on Signal Processing and Communications (SPCOM).

[35]  David P. Woodruff,et al.  Communication lower bounds for statistical estimation problems via a distributed data processing inequality , 2015, STOC.

[36]  Oded Goldreich The uniform distribution is complete with respect to testing identity to a fixed distribution , 2016, Electron. Colloquium Comput. Complex..

[37]  Chunming Qiao,et al.  Mutual Information Optimally Local Private Discrete Distribution Estimation , 2016, ArXiv.

[38]  Ilias Diakonikolas,et al.  Learning Structured Distributions , 2016, Handbook of Big Data.

[39]  Maxim Raginsky,et al.  Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation , 2016, IEEE Transactions on Information Theory.

[40]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[41]  Clément L. Canonne,et al.  Distribution Testing Lower Bounds via Reductions from Communication Complexity , 2017, Computational Complexity Conference.

[42]  Jerry Li,et al.  Communication-Efficient Distributed Learning of Discrete Distributions , 2017, NIPS.

[43]  Vitaly Feldman,et al.  A General Characterization of the Statistical Query Complexity , 2016, COLT.

[44]  A. Barg,et al.  Optimal Schemes for Discrete Distribution Estimation Under Locally Differential Privacy , 2017, IEEE Transactions on Information Theory.

[45]  Alexandr Andoni,et al.  Two Party Distribution Testing: Communication and Security , 2018, IACR Cryptol. ePrint Arch..

[46]  Or Sheffet,et al.  Locally Private Hypothesis Testing , 2018, ICML.

[47]  Himanshu Tyagi,et al.  Extra Samples can Reduce the Communication for Independence Testing , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[48]  Yanjun Han,et al.  Distributed Statistical Estimation of High-Dimensional and Nonparametric Distributions , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[49]  Sivaraman Balakrishnan,et al.  Hypothesis Testing for High-Dimensional Multinomials: A Selective Review , 2017, ArXiv.

[50]  Rotem Oshman,et al.  Distributed Uniformity Testing , 2018, PODC.

[51]  Varun Kanade,et al.  Statistical Windows in Testing for the Initial Distribution of a Reversible Markov Chain , 2019, AISTATS.

[52]  Himanshu Tyagi,et al.  Test without Trust: Optimal Locally Private Distribution Testing , 2018, AISTATS.

[53]  Huanyu Zhang,et al.  Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication , 2018, AISTATS.

[54]  Daniel M. Kane,et al.  Communication and Memory Efficient Testing of Discrete Distributions , 2019, COLT.

[55]  Himanshu Tyagi,et al.  Inference Under Information Constraints II: Communication Constraints and Shared Randomness , 2019, IEEE Transactions on Information Theory.

[56]  Yanjun Han,et al.  Geometric Lower Bounds for Distributed Parameter Estimation Under Communication Constraints , 2018, IEEE Transactions on Information Theory.

[57]  Himanshu Tyagi,et al.  Inference Under Information Constraints III: Local Privacy Constraints , 2021, IEEE Journal on Selected Areas in Information Theory.