The shortest common supersequence problem in a microarray production setting

MOTIVATION During microarray production, several thousands of oligonucleotides (short DNA sequences) are synthesized in parallel, one nucleotide at a time. We are interested in finding the shortest possible nucleotide deposition sequence to synthesize all oligos in order to reduce production time and increase oligo quality. Thus we study the shortest common super-sequence problem of several thousand short strings over a four-letter alphabet. RESULTS We present a statistical analysis of the basic ALPHABET-LEFTMOST approximation algorithm, and propose several practical heuristics to reduce the length of the super-sequence. Our results show that it is hard to beat ALPHABET-LEFTMOST in the microarray production setting by more than 2 characters, but these savings can improve overall oligo quality by more than four percent. AVAILABILITY Source code in C may be obtained by contacting the author, or from http://oligos.molgen.mpg.de.

[1]  Gary D. Stormo,et al.  Selection of optimal DNA oligos for gene expression arrays , 2001, Bioinform..

[2]  Jürgen Branke,et al.  Improved heuristics and a genetic algorithm for finding short supersequences , 1998 .

[3]  Alexander Schliep,et al.  Selecting signature oligonucleotides to identify organisms using DNA arrays , 2002, Bioinform..

[4]  Pavel A Pevzner,et al.  Combinatorial algorithms for design of DNA arrays. , 2002, Advances in biochemical engineering/biotechnology.

[5]  Earl Hubbell,et al.  Fidelity Probes for DNA Arrays , 1999, ISMB.

[6]  Charles J. Colbourn,et al.  Construction of optimal quality control for oligo arrays , 2002, Bioinform..

[7]  Sven Rahmann Fast and sensitive probe selection for DNA chips using jumps in matching statistics , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[8]  Sven Rahmann,et al.  Rapid large-scale oligonucleotide selection for microarrays , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[9]  Robert W. Irving,et al.  On the Worst-Case Behaviour of Some Approximation Algorithms for the Shortest Common Supersequence of k Strings , 1993, CPM.

[10]  Andrew B. Kahng,et al.  Border Length Minimization in DNA Array Design , 2002, WABI.

[11]  Robert W. Irving,et al.  Approximation Algorithms for the Shortest Common Supersequence , 1995, Nord. J. Comput..

[12]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1994, SIAM J. Comput..

[13]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1995, SIAM J. Comput..