Testing Distributions of Huge Objects

We initiate a study of a new model of property testing that is a hybrid of testing properties of distributions and testing properties of strings. Specifically, the new model refers to testing properties of distributions, but these are distributions over huge objects (i.e., very long strings). Accordingly, the model accounts for the total number of local probes into these objects (resp., queries to the strings) as well as for the distance between objects (resp., strings). Specifically, the distance between distributions is defined as the earth mover’s distance with respect to the relative Hamming distance between strings. We study the query complexity of testing in this new model, focusing on three directions. First, we try to relate the query complexity of testing properties in the new model to the sample complexity of testing these properties in the standard distribution testing model. Second, we consider the complexity of testing properties that arise naturally in the new model (e.g., distributions that capture random variations of fixed strings). Third, we consider the complexity of testing properties that were extensively studied in the standard distribution testing model: Two such cases are uniform distributions and pairs of identical distributions, where we obtain the following results. • Testing whether a distribution over n-bit long strings is uniform on some set of size m can be tested with query complexity Õ(m/ ), where > (log2m)/n is the proximity parameter. • Testing whether two distribution over n-bit long strings that have support size at most m are identical can be tested with query complexity Õ(m/ ). Both upper bounds are pretty tight; that is, for = Ω(1), the first task requires Ω(m) queries for any c < 1 and n = ω(logm), whereas the second task requires Ω(m) queries. Note that the query complexity of the first task is higher than the sample complexity of the corresponding task in the standard distribution testing model, whereas in the case of the second task the bounds almost match. ∗Partially supported by the Israel Science Foundation (grant No. 1041/18). †Department of Computer Science, Weizmann Institute of Science, Rehovot, Israel. E-mail: oded.goldreich@weizmann.ac.il. Additional funding received from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 819702). ‡School of Electrical Engineering, Tel Aviv University, Tel Aviv, Israel. E-mail: danaron@tau.ac.il ISSN 1433-8092 Electronic Colloquium on Computational Complexity, Report No. 133 (2021)

[1]  Ron Rothblum,et al.  Relaxed Locally Correctable Codes , 2018, ITCS.

[2]  Eli Ben-Sasson,et al.  Robust PCPs of Proximity, Shorter PCPs, and Applications to Coding , 2004, SIAM J. Comput..

[3]  Tugkan Batu Testing Properties of Distributions , 2001 .

[4]  Igor Shinkar,et al.  Relaxed Locally Correctable Codes with Nearly-Linear Block Length and Constant Query Complexity , 2020, SODA.

[5]  Daniel M. Kane,et al.  A New Approach for Testing Properties of Discrete Distributions , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[6]  Tugkan Batu,et al.  Generalized Uniformity Testing , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[7]  Eldar Fischer,et al.  Testing graph isomorphism , 2006, SODA '06.

[8]  Krzysztof Onak,et al.  The query complexity of graph isomorphism: bypassing distribution testing lower bounds , 2018, STOC.

[9]  Oded Goldreich Testing Isomorphism in the Bounded-Degree Graph Model , 2019, Electron. Colloquium Comput. Complex..

[10]  Oded Goldreich On Multiple Input Problems in Property Testing , 2013, Electron. Colloquium Comput. Complex..

[11]  Clément L. Canonne,et al.  A Survey on Distribution Testing: Your Data is Big. But is it Blue? , 2020, Electron. Colloquium Comput. Complex..

[12]  Daniel M. Kane,et al.  Sharp Bounds for Generalized Uniformity Testing , 2017, Electron. Colloquium Comput. Complex..

[13]  Dana Ron,et al.  A Lower Bound on the Complexity of Testing Grained Distributions , 2021, Electron. Colloquium Comput. Complex..

[14]  Paul Valiant,et al.  Estimating the Unseen , 2013, NIPS.

[15]  Dana Ron,et al.  Strong Lower Bounds for Approximating Distribution Support Size and the Distinct Elements Problem , 2009, SIAM J. Comput..

[16]  Oded Goldreich,et al.  Introduction to Property Testing , 2017 .