Design and analysis of periodic multiple seeds

A wide class of approximate pattern matching algorithms are based on a filtration phase in which spaced seeds are used to discard regions where a match is not likely to occur. The problem of determining the ''optimal'' shape of a spaced seed in a particular setting is known to be a hard one: in practice spaced seeds are chosen using heuristics or considering a restricted family of seeds with ''reasonably good'' performances. In this paper we consider the family of spaced seeds with a periodic structure. Such seeds have been already proven valuable both as a theoretical tool and in bioinformatics applications. We show that known combinatorial objects, namely Difference Sets and Families and Steiner Systems, naturally lead to the design of lossless periodic seeds for approximate pattern matching with k=2 and k=3 mismatches. We analyze in depth the properties of the resulting seeds obtaining insights also on seeds without a periodic structure. The results of the analysis are then used to guide an experimental evaluation of the effectiveness of periodic seeds for pattern lengths of practical interest. Our results give a complete picture of strengths and limitations of periodic seeds, and can be used by practitioners for the design of effective approximate pattern matching algorithms.

[1]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[2]  Juha Kärkkäinen,et al.  Better Filtering with Gapped q-Grams , 2001, Fundam. Informaticae.

[3]  Bin Ma,et al.  On the complexity of the spaced seeds , 2007, J. Comput. Syst. Sci..

[4]  Bin Ma,et al.  On spaced seeds for similarity search , 2004, Discret. Appl. Math..

[5]  Marco Buratti,et al.  On disjoint (3t, 3, 1) cyclic difference families , 2010 .

[6]  Tao Feng,et al.  Constructions for strictly cyclic 3-designs and applications to optimal OOCs with lambda=2 , 2008, J. Comb. Theory, Ser. A.

[7]  Marco Buratti,et al.  Pairwise balanced designs from finite fields , 1999, Discret. Math..

[8]  Lucian Ilie,et al.  SHRiMP2: Sensitive yet Practical Short Read Mapping , 2011, Bioinform..

[9]  Giovanni Manzini,et al.  Spaced Seed Design Using Perfect Rulers , 2014, Fundam. Informaticae.

[10]  Tao Feng,et al.  Constructions for cyclic 3-designs and improved results on cyclic Steiner quadruple systems , 2011 .

[11]  Lucian Ilie,et al.  SpEED: fast computation of sensitive spaced seeds , 2011, Bioinform..

[12]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[13]  Giovanni Manzini,et al.  Better spaced seeds using Quadratic Residues , 2013, J. Comput. Syst. Sci..

[14]  Bin Ma,et al.  ZOOM! Zillions of oligos mapped , 2008, Bioinform..

[15]  Ting Chen,et al.  PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds , 2009, Bioinform..

[16]  Marco Buratti,et al.  Constructions of (q, k, 1) difference families with q a prime power and k = 4, 5 , 1995, Discret. Math..

[17]  Jacobus H. van Lint,et al.  On the Number of Blocks in a Generalized Steiner System , 1997, J. Comb. Theory, Ser. A.

[18]  B. Wichmann A Note on Restricted Difference Bases , 1963 .

[19]  Lucian Ilie,et al.  Seeds for effective oligonucleotide design , 2011, BMC Genomics.

[20]  G. Kucherov,et al.  Multiseed lossless filtration , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Tien-Tsin Wong,et al.  Two new quorum based algorithms for distributed mutual exclusion , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[22]  Bin Ma,et al.  Patternhunter Ii: Highly Sensitive and Fast Homology Search , 2004, J. Bioinform. Comput. Biol..

[23]  François Nicolas,et al.  Hardness of optimal spaced seed design , 2008, J. Comput. Syst. Sci..

[24]  Gad M. Landau,et al.  Optimal spaced seeds for faster approximate string matching , 2007, J. Comput. Syst. Sci..

[25]  C. Colbourn,et al.  Handbook of Combinatorial Designs , 2006 .

[26]  Silvana Ilie Efficient computation of spaced seeds , 2011, BMC Research Notes.

[27]  J. Singer A theorem in finite projective geometry and some applications to number theory , 1938 .

[28]  Ruizhong Wei,et al.  Existence of (q, 7, 1) difference families with q a prime power , 2002 .