Shuffling biological sequencess with motifs constraints

We study the following problem : given a biological sequence S, a multiset M of motifs and an integer k, generate uniformly random sequences which contain the given motifs and have exactly the same frequencies occurrences of k-lets (i.e. factors of length k) of S. This question involves difficult problems: We notably state that the problem of deciding whether a sequence respects given motifs constraints is NP-complete. Meanwhile, we give a random generation algorithm which turns out to be experimentally efficient.

[1]  David Bruce Wilson,et al.  How to Get a Perfectly Random Sample from a Generic Markov Chain and Generate a Random Spanning Tree of a Directed Graph , 1998, J. Algorithms.

[2]  David Aldous,et al.  The Random Walk Construction of Uniform Spanning Trees and Uniform Labelled Trees , 1990, SIAM J. Discret. Math..

[3]  Mireille Régnier,et al.  A unified approach to word occurrence probabilities , 2000, Discret. Appl. Math..

[4]  Peter Winkler,et al.  Shuffling Biological Sequences , 1996, Discret. Appl. Math..

[5]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[6]  Gesine Reinert,et al.  Probabilistic and Statistical Properties of Words: An Overview , 2000, J. Comput. Biol..

[7]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Andrei Z. Broder,et al.  Generating random spanning trees , 1989, 30th Annual Symposium on Foundations of Computer Science.

[10]  L. Lovász,et al.  Polynomial Algorithms for Perfect Graphs , 1984 .

[11]  Jacques van Helden,et al.  Metrics for comparing regulatory sequences on the basis of pattern counts , 2004, Bioinform..

[12]  Mireille Régnier,et al.  Assessing the Statistical Significance of Overrepresented Oligonucleotides , 2001, WABI.

[13]  M. Sagot,et al.  Promoter sequences and algorithmical methods for identifying them. , 1999, Research in microbiology.