Seed optimization for i.i.d. similarities is no easier than optimal Golomb ruler design

The spaced seed is a filtration method to efficiently identify the regions of interest in string similarity searches. It is important to find the optimal spaced seed that achieves the highest search sensitivity. For some simple distributions of the similarities, the seed optimization problem was proved to be not NP-hard. On the other hand, no polynomial time algorithm has been found despite the extensive researches in the literature. In this article we examine the hardness of the seed optimization problem by a polynomial time reduction from the optimal Golomb ruler design problem, which is a well-known difficult (but not NP-hard) problem in combinatorial design.

[1]  Bin Ma,et al.  On the complexity of the spaced seeds , 2007, J. Comput. Syst. Sci..

[2]  Louxin Zhang,et al.  Superiority and complexity of the spaced seeds , 2006, SODA '06.

[3]  Jeremy Buhler,et al.  Designing seeds for similarity search in genomic DNA , 2003, RECOMB '03.

[4]  Louxin Zhang,et al.  Good spaced seeds for homology search , 2004, Bioinform..

[5]  Arthur J. Bernstein,et al.  A class of binary recurrent codes with limited error propagation , 1967, IEEE Trans. Inf. Theory.

[6]  Pavel A. Pevzner,et al.  Multiple filtration and approximate pattern matching , 1995, Algorithmica.

[7]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[8]  Louxin Zhang,et al.  Sensitivity analysis and efficient method for identifying optimal spaced seeds , 2004, J. Comput. Syst. Sci..

[9]  Bin Ma,et al.  Seed Optimization Is No Easier than Optimal Golomb Ruler Design , 2007, APBC.

[10]  Apostolos Dollas,et al.  A New Algorithm for Golomb Ruler Derivation and Proof of the 19 Mark Ruler , 1998, IEEE Trans. Inf. Theory.

[11]  Lucian Ilie,et al.  Fast Computation of Good Multiple Spaced Seeds , 2007, WABI.

[12]  François Nicolas,et al.  Hardness of optimal spaced seed design , 2008, J. Comput. Syst. Sci..

[13]  Bin Ma,et al.  On spaced seeds for similarity search , 2004, Discret. Appl. Math..

[14]  Bin Ma,et al.  Optimizing Multiple Spaced Seeds for Homology Search , 2004, CPM.

[15]  G. Kucherov,et al.  Multiseed lossless filtration , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Gad M. Landau,et al.  Optimal spaced seeds for faster approximate string matching , 2007, J. Comput. Syst. Sci..

[17]  Erkki Sutinen,et al.  Experiments on Block Indexing , 2006 .

[18]  C. Colbourn,et al.  The CRC handbook of combinatorial designs , edited by Charles J. Colbourn and Jeffrey H. Dinitz. Pp. 784. $89.95. 1996. ISBN 0-8493-8948-8 (CRC). , 1997, The Mathematical Gazette.

[19]  Kun-Mao Chao,et al.  Efficient methods for generating optimal single and multiple spaced seeds , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[20]  J. Galambos,et al.  Bonferroni-type inequalities with applications , 1996 .

[21]  Juha Kärkkäinen,et al.  Better Filtering with Gapped q-Grams , 2001, Fundam. Informaticae.

[22]  Stephen R. Mahaney Sparse complete sets for NP: Solution of a conjecture of Berman and Hartmanis , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[23]  Bin Ma,et al.  Patternhunter Ii: Highly Sensitive and Fast Homology Search , 2004, J. Bioinform. Comput. Biol..

[24]  Franco P. Preparata,et al.  Quick, Practical Selection of Effective Seeds for Homology Search , 2005, J. Comput. Biol..