String Matching: Communication, Circuits, and Learning

String matching is the problem of deciding whether a given $n$-bit string contains a given $k$-bit pattern. We study the complexity of this problem in three settings. Communication complexity. For small $k$, we provide near-optimal upper and lower bounds on the communication complexity of string matching. For large $k$, our bounds leave open an exponential gap; we exhibit some evidence for the existence of a better protocol. Circuit complexity. We present several upper and lower bounds on the size of circuits with threshold and DeMorgan gates solving the string matching problem. Similarly to the above, our bounds are near-optimal for small $k$. Learning. We consider the problem of learning a hidden pattern of length at most $k$ relative to the classifier that assigns 1 to every string that contains the pattern. We prove optimal bounds on the VC dimension and sample complexity of this problem.

[1]  Daniel Reichman,et al.  Deleting and Testing Forbidden Patterns in Multi-Dimensional Arrays , 2017, ICALP.

[2]  Robert Krauthgamer,et al.  The Sketching Complexity of Pattern Matching , 2004, APPROX-RANDOM.

[3]  Daniel M. Kane,et al.  Super-linear gate and super-quadratic wire lower bounds for depth-two and depth-three threshold circuits , 2015, STOC.

[4]  Jehoshua Bruck,et al.  Depth efficient neural networks for division and related problems , 1993, IEEE Trans. Inf. Theory.

[5]  Alexander A. Razborov,et al.  The Sign-Rank of AC0 , 2010, SIAM J. Comput..

[6]  György Turán,et al.  A Liniear lower bound for the size of threshold circuits , 1993, Bull. EATCS.

[7]  Bala Kalyanasundaram,et al.  The Probabilistic Communication Complexity of Set Intersection , 1992, SIAM J. Discret. Math..

[8]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[9]  Guido Sanguinetti,et al.  Advances in Neural Information Processing Systems 24 , 2011 .

[10]  Ronald L. Rivest On the Worst-Case Behavior of String-Searching Algorithms , 1977, SIAM J. Comput..

[11]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[12]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[13]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[14]  Mark Daniel Ward,et al.  On Correlation Polynomials and Subword Complexity , 2007 .

[15]  Ronitt Rubinfeld,et al.  Efficient learning of typical finite automata from random walks , 1993, STOC.

[16]  Gonzalo Navarro,et al.  Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences , 2002 .

[17]  Zvi Galil,et al.  A lower bound for parallel string matching , 1991, STOC '91.

[18]  Stasys Jukna,et al.  Boolean Function Complexity Advances and Frontiers , 2012, Bull. EATCS.

[19]  Robert A. Legenstein,et al.  Foundations for a Circuit Complexity Theory of Sensory Processing , 2000, NIPS.

[20]  Ely Porat,et al.  Exact and Approximate Pattern Matching in the Streaming Model , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[21]  Stasys Jukna,et al.  On Graph Complexity , 2006, Combinatorics, Probability and Computing.

[22]  Wojciech Rytter,et al.  Optimally fast parallel algorithms for preprocessing and pattern matching in one and two dimensions , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[23]  Thomas Watson,et al.  Communication Complexity of Statistical Distance , 2018, Electron. Colloquium Comput. Complex..

[24]  Andrew Chi-Chih Yao,et al.  Informational complexity and the direct sum problem for simultaneous message complexity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[25]  J. Håstad Computational limitations of small-depth circuits , 1987 .

[26]  Pavel Pudlák,et al.  Top-down lower bounds for depth-three circuits , 1995, computational complexity.

[27]  M. Yannakakis Expressing combinatorial optimization problems by linear programs , 1991, Symposium on the Theory of Computing.

[28]  Zvi Galil,et al.  Time-Space-Optimal String Matching , 1983, J. Comput. Syst. Sci..

[29]  A. Razborov Communication Complexity , 2011 .

[30]  Friedhelm Meyer auf der Heide,et al.  Transforming Comparison Model Lower Bounds to the Parallel-Random-Access-Machine , 1997, Inf. Process. Lett..

[31]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[32]  Amit Daniely,et al.  Complexity Theoretic Limitations on Learning DNF's , 2014, COLT.

[33]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[34]  Georg Schnitger,et al.  Parallel Computation with Threshold Functions , 1986, J. Comput. Syst. Sci..

[35]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[36]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[37]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[38]  Pavel Pudlák,et al.  Threshold circuits of bounded depth , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[39]  Steve Hanneke,et al.  The Optimal Sample Complexity of PAC Learning , 2015, J. Mach. Learn. Res..

[40]  Ian Parberry,et al.  Circuit complexity and neural networks , 1994 .

[41]  Satyanarayana V. Lokam,et al.  Relations Between Communication Complexity, Linear Arrangements, and Computational Complexity , 2001, FSTTCS.

[42]  R. Zemel,et al.  On the Representational Efficiency of Restricted Boltzmann Machines , 2013, NIPS 2013.

[43]  Ronitt Rubinfeld,et al.  Exactly Learning Automata of Small Cover Time , 2004, Machine Learning.

[44]  Alexander A. Razborov,et al.  On Small Depth Threshold Circuits , 1992, SWAT.

[45]  N. Nisan The communication complexity of threshold gates , 1993 .

[46]  Eyal Kushilevitz,et al.  On learning visual concepts and DNF formulae , 1993, COLT '93.

[47]  Zvi Galil Optimal Parallel Algorithms for String Matching , 1985, Inf. Control..

[48]  Mark Braverman,et al.  A Discrepancy Lower Bound for Information Complexity , 2015, Algorithmica.

[49]  Robert A. Legenstein,et al.  Neural circuits for pattern recognition with small total wire length , 2002, Theor. Comput. Sci..

[50]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[51]  Amit Chakrabarti,et al.  An Optimal Lower Bound on the Communication Complexity of Gap-Hamming-Distance , 2012, SIAM J. Comput..

[52]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[53]  Alon Orlitsky,et al.  Neural Models and Spectral Methods , 1994 .

[54]  Xiao Zhou,et al.  Threshold Circuits for Global Patterns in 2-Dimensional Maps , 2015, WALCOM.

[55]  Christian Rosenke The exact complexity of projective image matching , 2016, J. Comput. Syst. Sci..

[56]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[57]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[58]  Kristoffer Arnsfelt Hansen,et al.  Exact Threshold Circuits , 2010, 2010 IEEE 25th Annual Conference on Computational Complexity.

[59]  Alexander A. Razborov,et al.  On the Distributional Complexity of Disjointness , 1992, Theor. Comput. Sci..

[60]  Mark Braverman Interactive information complexity , 2012, STOC '12.

[61]  Alon Orlitsky,et al.  Lower bounds on threshold and related circuits via communication complexity , 1994, IEEE Trans. Inf. Theory.

[62]  Thomas Watson,et al.  Communication Complexity of Set-Disjointness for All Probabilities , 2014, APPROX-RANDOM.

[63]  Saburo Muroga,et al.  Threshold logic and its applications , 1971 .

[64]  Arkadev Chattopadhyay,et al.  The log-approximate-rank conjecture is false , 2018, Electron. Colloquium Comput. Complex..

[65]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[66]  Gary Benson,et al.  An Alphabet Independent Approach to Two-Dimensional Pattern Matching , 1994, SIAM J. Comput..

[67]  Ravi Kumar,et al.  An information statistics approach to data stream and communication complexity , 2004, J. Comput. Syst. Sci..

[68]  Jehoshua Bruck,et al.  On The Power Of Threshold Circuits With Small Weights , 1991, Proceedings. 1991 IEEE International Symposium on Information Theory.

[69]  H. Shvaytser,et al.  Learnable and nonlearnable visual concepts , 1990 .