Sequencing-by-Hybridization at the Information-Theory Bound: An Optimal Algorithm

In a recent paper (Preparata et aL, 1999) we introduced a novel probing scheme for DNA sequencing by hybridization (SBH). The new gapped-probe scheme combines natural and universal bases in a well-defined periodic pattern. It has been shown (Preparata et al, 1999) that the performance of the gapped-probe scheme (in terms of the length of a sequence that can be uniquely reconstructed using a given size library of probes) is significantly better than the standard scheme based on oligomer probes. In this paper we present and analyze a new, more powerful, sequencing algorithm for the gapped-probe scheme. We prove that the new algorithm exploits the full potential of the SBH technology with high-confidence performance that comes within a small constant factor (about 2) of the information-theory bound. Moreover, this performance is achieved while maintaining running time linear in the target sequence length.

[1]  R. Drmanac,et al.  Sequencing of megabase plus DNA by hybridization: theory of the method. , 1989, Genomics.

[2]  Martin E. Dyer,et al.  The Probability of Unique Solutions of Sequencing by Hybridization , 1994, J. Comput. Biol..

[3]  W. Bains,et al.  A novel method for nucleic acid sequence determination. , 1988, Journal of theoretical biology.

[4]  D. M. Brown,et al.  5-Nitroindole as an universal base analogue. , 1994, Nucleic acids research.

[5]  P. Pevzner,et al.  Improved chips for sequencing by hybridization. , 1991, Journal of biomolecular structure & dynamics.