On the Fine Grained Complexity of Polynomial Time Problems Given Correlated Instances

We set out to study the impact of having access to correlated instances on the fine grained complexity of polynomial time problems, which have notoriously resisted improvement. In particular, we show how to use a logarithmic number of auxiliary correlated instances to obtain o(n) time algorithms for the longest common subsequence(LCS) problem and the minimum edit distance (EDIT) problem. For the problem of longest common subsequence of k sequences we show an O(nk log n) time algorithm with access to a logarithmic number of auxiliary correlated instances. Our results hold for a worst case choice of the primary instance whereas the auxiliary correlated instances are chosen according to a natural correlation model between instances. Previously, it has been shown that any improvement over O(n) for the worst case complexity of the longest common subsequence and minimum edit distance problem would imply radically improved algorithms than currently known for a host of long studied polynomial time problems such as finding a pair of orthogonal vectors as well as imply that the Strong Exponential Time Hypothesis is false. The best known algorithm for the multiple sequence longest common subsequence problem is a variant of dynamic programming which requires O(n) worst case runtime. We note that sequence alignment is often used in identifying conserved sequence regions across a group of sequences of DNA, RNA or proteins hypothesized to be evolutionarily related, or as aid in establishing evolutionary relationships by constructing phylogenetic trees, but is notoriously computationally prohibitive for k > 3. An intriguing question, which served as an inspiration for our work, is to find correlation models which coincide with evolutionary models and other relationships and for which our approach to multiple sequence alignment gives provable guarantees.

[1]  Dana Ron,et al.  Property testing and its connection to learning and approximation , 1998, JACM.

[2]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[3]  Shang-Hua Teng,et al.  Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.

[4]  Uriel Feige,et al.  Heuristics for Semirandom Graph Problems , 2001, J. Comput. Syst. Sci..

[5]  Alexandr Andoni,et al.  The Smoothed Complexity of Edit Distance , 2008, ICALP.

[6]  Ryan Williams,et al.  Probabilistic Polynomials and Hamming Nearest Neighbors , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[7]  Shafi Goldwasser,et al.  The Computational Benefit of Correlated Instances , 2015, ITCS.

[8]  Eric Wustrow,et al.  Mining Your Ps and Qs: Detection of Widespread Weak Keys in Network Devices , 2012, USENIX Security Symposium.

[9]  Shang-Hua Teng,et al.  Smoothed Analysis (Motivation and Discrete Models) , 2003, WADS.

[10]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[11]  Ryan Williams,et al.  Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made , 2015, STOC.

[12]  Amir Abboud,et al.  Tight Hardness Results for LCS and Other Sequence Similarity Measures , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[13]  Harald Niederreiter,et al.  Probability and computing: randomized algorithms and probabilistic analysis , 2006, Math. Comput..

[14]  Piotr Indyk,et al.  Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false) , 2014, STOC.