Open Problems in Data Streams, Property Testing, and Related Topics

This document contains a list of open problems and research directions that have been suggested by participants at the Bertinoro Workshop on Sublinear Algorithms (May 2011) and IITK Workshop on Algorithms for Processing Massive Data Sets (December 2009). Many of the questions were discussed at the workshop or were posed during presentations. Further details can be found at www.dcs.warwick.ac.uk/ ̃czumaj/Bertinoro_2011 www2.cse.iitk.ac.in/ ̃fsttcs/2009/wapmds Lists compiled by Piotr Indyk (indyk@mit.edu), Andrew McGregor (mcgregor@cs.umass.edu), Ilan Newman (ilan@cs.haifa.ac.il), and Krzysztof Onak (konak@cs.cmu.edu). BERTINORO WORKSHOP PARTICIPANTS: Nir Ailon, Noga Alon, Alexandr Andoni, Arnab Bhattacharyya, Vladimir Braverman, Amit Chakrabarti, Graham Cormode, Artur Czumaj, Pierre Fraigniaud, Oded Goldreich, Nir Halman, Sariel Har-Peled, Piotr Indyk, Tali Kaufman, Robert Krauthgamer, Oded Lachish, Michael Mahoney, Andrew McGregor, Morteza Monemizadeh, Jelani Nelson, Ilan Newman, Krzysztof Onak, Ely Porat, Sofya Raskhodnikova, Ronitt Rubinfeld, Rocco Servedio, Madhu Sudan, Ben Recht, Justin Romberg, Dana Ron, C. Seshadhri, Asaf Shapira, Christian Sohler, Gilad Tsur, Paul Valiant, Roger Wattenhofer, David Woodruff, Ning Xie, and Yuichi Yoshida. KANPUR WORKSHOP PARTICIPANTS: Pankaj K. Agarwal, Kook Jin Ahn, Paul Beame, Amit Chakrabarti, Inderjit Dhillon, Dan Feldman, Sumit Ganguly, Sudipto Guha, Piotr Indyk, T. S. Jayram, Christiane Lammersen, Michael Mahoney, Andrew McGregor, Jelani Nelson, Krzysztof Onak, Rina Panigrahy, Ely Porat, Jaikumar Radhakrishnan, Christian Sohler, Joel Tropp, Matthias Westermann, and David Woodruff.

[1]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[2]  Dana Ron,et al.  Improved Testing Algorithms for Monotonicity , 1999, Electron. Colloquium Comput. Complex..

[3]  Andrew Chi-Chih Yao,et al.  Informational complexity and the direct sum problem for simultaneous message complexity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[4]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[5]  Michael A. Bender,et al.  Testing properties of directed graphs: acyclicity and connectivity , 2002, Random Struct. Algorithms.

[6]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[7]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[8]  David P. Woodruff,et al.  Tight lower bounds for the distinct elements problem , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[9]  David P. Woodruff Optimal space lower bounds for all frequency moments , 2004, SODA '04.

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Annalisa De Bonis,et al.  Optimal Two-Stage Algorithms for Group Testing Problems , 2005, SIAM J. Comput..

[12]  Subhash Khot,et al.  Nonembeddability theorems via Fourier analysis , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[13]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[14]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[15]  Timothy M. Chan,et al.  Multi-Pass Geometric Algorithms , 2005, Discret. Comput. Geom..

[16]  Andrew McGregor,et al.  Finding Graph Matchings in Data Streams , 2005, APPROX-RANDOM.

[17]  Erik Vee,et al.  Finding longest increasing and common subsequences in streaming data , 2006, J. Comb. Optim..

[18]  Robert Krauthgamer,et al.  Estimating the sortedness of a data stream , 2007, SODA '07.

[19]  Jon Feldman,et al.  On distributing symmetric streaming computations , 2008, SODA '08.

[20]  Sudipto Guha,et al.  Tight Lower Bounds for Multi-pass Stream Computation Via Pass Elimination , 2008, ICALP.

[21]  Funda Ergün,et al.  On distance to monotonicity and longest increasing subsequence of a data stream , 2008, SODA '08.

[22]  P. Indyk,et al.  Near-Optimal Sparse Recovery in the L1 Norm , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[23]  Alexandr Andoni,et al.  Earth mover distance over high-dimensional spaces , 2008, SODA '08.

[24]  Madhu Sudan,et al.  2-Transitivity Is Insufficient for Local Testability , 2008, Computational Complexity Conference.

[25]  Sreenivas Gollapudi,et al.  Estimating PageRank on graph streams , 2008, PODS.

[26]  Krzysztof Onak,et al.  Constant-Time Approximation Algorithms via Local Improvements , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[27]  P. Indyk,et al.  Near-Optimal Sparse Recovery in the L1 Norm , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[28]  Oded Schramm,et al.  Every minor-closed property of sparse graphs is testable , 2008, Electron. Colloquium Comput. Complex..

[29]  Kai-Min Chung,et al.  Why simple hash functions work: exploiting the entropy in a data stream , 2008, SODA '08.

[30]  Krzysztof Onak,et al.  Sketching and Streaming Entropy via Approximation Theory , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[31]  Sudipto Guha,et al.  Graph Sparsification in the Semi-streaming Model , 2009, ICALP.

[32]  Alexandr Andoni,et al.  Approximating edit distance in near-linear time , 2009, STOC '09.

[33]  Krzysztof Onak,et al.  Local Graph Partitions for Approximation and Testing , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[34]  Alexandr Andoni,et al.  Overcoming the l1 non-embeddability barrier: algorithms for product metrics , 2009, SODA.

[35]  David P. Woodruff,et al.  The Data Stream Space Complexity of Cascaded Norms , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[36]  Luca Trevisan,et al.  Max cut and the smallest eigenvalue , 2008, STOC '09.

[37]  Joshua Brody,et al.  A Multi-Round Communication Lower Bound for Gap Hamming and Some Consequences , 2009, 2009 24th Annual IEEE Conference on Computational Complexity.

[38]  Yuichi Yoshida,et al.  An improved constant-time approximation algorithm for maximum~matchings , 2009, STOC '09.

[39]  David P. Woodruff,et al.  Efficient Sketches for Earth-Mover Distance, with Applications , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[40]  Avinatan Hassidim,et al.  Derandomizing Algorithms on Product Distributions and Other Applications of Order-Based Extraction , 2010, ICS.

[41]  Anna Gál,et al.  Lower Bounds on Streaming Algorithms for Approximating the Length of the Longest Increasing Subsequence , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[42]  Alexandr Andoni,et al.  Lower bounds for edit distance and product metrics via Poincaré-type inequalities , 2010, SODA '10.

[43]  Amit Chakrabarti,et al.  A note on randomized streaming space bounds for the longest increasing subsequence problem , 2012, Inf. Process. Lett..

[44]  R. Ostrovsky,et al.  Zero-one frequency laws , 2010, STOC '10.

[45]  Joshua Brody,et al.  Better Gap-Hamming Lower Bounds via Better Round Elimination , 2010, APPROX-RANDOM.

[46]  Krzysztof Onak,et al.  New sublinear methods in the struggle against classical problems , 2010 .

[47]  Volkan Cevher,et al.  Model-Based Compressive Sensing , 2008, IEEE Transactions on Information Theory.

[48]  Sergei Vassilvitskii,et al.  A model of computation for MapReduce , 2010, SODA '10.

[49]  Graham Cormode,et al.  Information Cost Tradeoffs for Augmented Index and Streaming Language Recognition , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[50]  Claire Mathieu,et al.  Recognizing well-parenthesized expressions in the streaming model , 2009, STOC '10.

[51]  Amit Chakrabarti,et al.  An Optimal Lower Bound on the Communication Complexity of Gap-Hamming-Distance , 2012, SIAM J. Comput..

[52]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[53]  Sofya Raskhodnikova,et al.  Testing and Reconstruction of Lipschitz Functions with Applications to Data Privacy , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[54]  Eli Ben-Sasson,et al.  On Sums of Locally Testable Affine Invariant Properties , 2011, APPROX-RANDOM.

[55]  Nir Ailon,et al.  An almost optimal unrestricted fast Johnson-Lindenstrauss transform , 2010, SODA '11.

[56]  Piotr Indyk,et al.  K-median clustering, model-based compressive sensing, and sparse recovery for earth mover distance , 2011, STOC '11.

[57]  Rocco A. Servedio,et al.  Learning transformed product distributions , 2011, ArXiv.

[58]  Graham Cormode,et al.  Annotations in Data Streams , 2009, ICALP.

[59]  Jan Vondrák,et al.  Is Submodularity Testable? , 2010, Algorithmica.