Average-Case Communication Complexity of Statistical Problems

We study statistical problems, such as planted clique, its variants, and sparse principal component analysis in the context of average-case communication complexity. Our motivation is to understand the statistical-computational trade-offs in streaming, sketching, and querybased models. Communication complexity is the main tool for proving lower bounds in these models, yet many prior results do not hold in an average-case setting. We provide a general reduction method that preserves the input distribution for problems involving a random graph or matrix with planted structure. Then, we derive two-party and multi-party communication lower bounds for detecting or finding planted cliques, bipartite cliques, and related problems. As a consequence, we obtain new bounds on the query complexity in the edge-probe, vectormatrix-vector, matrix-vector, linear sketching, and F2-sketching models. Many of these results are nearly tight, and we use our techniques to provide simple proofs of some known lower bounds for the edge-probe model.

[1]  Xiaoyu He,et al.  On the subgraph query problem , 2019, Combinatorics, Probability and Computing.

[2]  David Conlon,et al.  Short Proofs of Some Extremal Results , 2014, Comb. Probab. Comput..

[3]  Tanya Y. Berger-Wolf,et al.  Sampling community structure , 2010, WWW '10.

[4]  Guy Bresler,et al.  Average-Case Lower Bounds for Learning Sparse Mixtures, Robust Estimation and Semirandom Adversaries , 2019, ArXiv.

[5]  Alan M. Frieze,et al.  A new approach to the planted clique problem , 2008, FSTTCS.

[6]  Miklós Z. Rácz,et al.  Finding a planted clique by adaptive probing , 2019, Latin American Journal of Probability and Mathematical Statistics.

[7]  Dana Ron,et al.  Property testing and its connection to learning and approximation , 1998, JACM.

[8]  Uriel Feige,et al.  Finding cliques using few probes , 2020, Random Struct. Algorithms.

[9]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[10]  David P. Woodruff,et al.  Communication lower bounds for statistical estimation problems via a distributed data processing inequality , 2015, STOC.

[11]  Bala Kalyanasundaram,et al.  The Probabilistic Communication Complexity of Set Intersection , 1992, SIAM J. Discret. Math..

[12]  Liudmila Ostroumova,et al.  Quick Detection of High-Degree Entities in Large Directed Networks , 2014, 2014 IEEE International Conference on Data Mining.

[13]  Avi Wigderson,et al.  Sum-of-squares Lower Bounds for Planted Clique , 2015, STOC.

[14]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[15]  Anup Rao,et al.  Communication Complexity: and Applications , 2020 .

[16]  Finding Planted Cliques in Sublinear Time , 2020, ArXiv.

[17]  Yihong Wu,et al.  Computational Barriers in Minimax Submatrix Detection , 2013, ArXiv.

[18]  Walter Willinger,et al.  Network Monitoring as a Streaming Analytics Problem , 2016, HotNets.

[19]  Harrison H. Zhou,et al.  Sparse CCA: Adaptive Estimation and Computational Barriers , 2014, 1409.8565.

[20]  Hao Huang,et al.  Streaming Anomaly Detection Using Randomized Matrix Sketching , 2015, Proc. VLDB Endow..

[21]  Pravesh Kothari,et al.  A Nearly Tight Sum-of-Squares Lower Bound for the Planted Clique Problem , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[22]  David P. Woodruff,et al.  Querying a Matrix through Matrix-Vector Products , 2019, ICALP.

[23]  Santosh S. Vempala,et al.  The Hidden Hubs Problem , 2017, COLT.

[24]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[25]  A. Razborov Communication Complexity , 2011 .

[26]  Oded Goldreich,et al.  Introduction to Property Testing , 2017 .

[27]  Xiaoming Sun,et al.  Streaming and Communication Complexity of Clique Approximation , 2012, ICALP.

[28]  U. Feige,et al.  Finding and certifying a large hidden clique in a semirandom graph , 2000, Random Struct. Algorithms.

[29]  Afonso S. Bandeira,et al.  Notes on computational-to-statistical gaps: predictions using statistical physics , 2018, Portugaliae Mathematica.

[30]  Guy Bresler,et al.  Reducibility and Statistical-Computational Gaps from Secret Leakage , 2020, COLT.

[31]  David P. Woodruff,et al.  The Simultaneous Communication of Disjointness with Applications to Data Streams , 2015, ICALP.

[32]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[33]  Ludek Kucera,et al.  Expected Complexity of Graph Partitioning Problems , 1995, Discret. Appl. Math..

[34]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[35]  Vladimir Braverman,et al.  New Bounds for the CLIQUE-GAP Problem Using Graph Decomposition Theory , 2015, Algorithmica.

[36]  Yuval Peres,et al.  Finding Hidden Cliques in Linear Time with High Probability , 2010, Combinatorics, Probability and Computing.

[37]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[38]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[39]  E. Arias-Castro,et al.  Community detection in dense random networks , 2014 .

[40]  Guy Bresler,et al.  The Average-Case Complexity of Counting Cliques in Erdős-Rényi Hypergraphs , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[41]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[42]  Joel H. Spencer,et al.  Coloring Random and Semi-Random k-Colorable Graphs , 1995, J. Algorithms.

[43]  Tina Eliassi-Rad,et al.  ε - WGX: Adaptive Edge Probing for Enhancing Incomplete Networks , 2017, WebSci.

[44]  Alexander A. Razborov,et al.  On the Distributional Complexity of Disjointness , 1992, Theor. Comput. Sci..

[45]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[46]  P. Rigollet,et al.  Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[47]  Cyrus Rashtchian,et al.  Vector-Matrix-Vector Queries for Solving Linear Algebra, Statistics, and Graph Problems , 2020, APPROX-RANDOM.

[48]  Dana Ron,et al.  Algorithmic Aspects of Property Testing in the Dense Graphs Model , 2009, Electron. Colloquium Comput. Complex..

[49]  Afonso S. Bandeira,et al.  Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio , 2019, ArXiv.

[50]  Subutai Ahmad,et al.  Unsupervised real-time anomaly detection for streaming data , 2017, Neurocomputing.

[51]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..