An important combinatorial problem, motivated by DNA sequencing in molecular biology, is the reconstruction of a sequence over a small finite alphabet from the collection of its probes (the sequence spectrum), obtained by sliding a fixed sampling pattern over the sequence. Such construction is required for Sequencing-by-Hybridization (SBH), a novel DNA sequencing technique based on an array (SBH chip) of short nucleotide sequences (probes). Once the sequence spectrum is biochemically obtained, a combinatorial method is used to reconstruct the DNA sequence from its spectrum. Since technology limits the number of probes on the SBH chip, a challenging combinatorial question is the design of a smallest set of probes that can sequence an arbitrary DNA string of a given length. We present in this work a novel probe design, crucially based on the use of universal bases [bases that bind to any nucleotide (Loakes and Brown, 1994)] that drastically improves the performance of the SBH process and asymptotically approaches the information-theoretic bound up to a constant factor. Furthermore, the sequencing algorithm we propose is substantially simpler than the Eulerian path method used in previous solutions of this problem.
[1]
W. Bains,et al.
A novel method for nucleic acid sequence determination.
,
1988,
Journal of theoretical biology.
[2]
P. Pevzner.
1-Tuple DNA sequencing: computer analysis.
,
1989,
Journal of biomolecular structure & dynamics.
[3]
D. M. Brown,et al.
5-Nitroindole as an universal base analogue.
,
1994,
Nucleic acids research.
[4]
Gesine Reinert,et al.
Poisson Process Approximation for Sequence Repeats and Sequencing by Hybridization
,
1996,
J. Comput. Biol..
[5]
L. Gordon,et al.
Two moments su ce for Poisson approx-imations: the Chen-Stein method
,
1989
.
[6]
P. Pevzner,et al.
Improved chips for sequencing by hybridization.
,
1991,
Journal of biomolecular structure & dynamics.
[7]
Martin E. Dyer,et al.
The Probability of Unique Solutions of Sequencing by Hybridization
,
1994,
J. Comput. Biol..
[8]
R. Drmanac,et al.
Sequencing of megabase plus DNA by hybridization: theory of the method.
,
1989,
Genomics.