Sequencing-by-hybridization revisited: the analog-spectrum proposal

All published approaches to DNA sequencing by hybridization (SBH) consist of the biochemical acquisition of the spectrum of a target sequence (the set of its subsequences conforming to a given probing pattern) followed by the algorithmic reconstruction of the sequence from its spectrum. In the "standard" or "uniform" approach, the probing pattern is a string of length L and the length of reliably reconstructible sequences is known to be m/sub len/ = O(2/sup L/). For a fixed microarray area, higher sequencing performance can be achieved by inserting nonprobing gaps ("wild-cards") in the probing pattern. The reconstruction, however, must cope with the emergence of fooling probes due to the gaps and algorithmic failure occurs when the spectrum becomes too densely populated, although we can achieve m/sub comp/ = O(4/sup L/). Despite the combinatorial success of gapped probing, all current approaches are based on a biochemically unrealistic spectrum-acquisition model (digital-spectrum). The reality of hybridization is much more complex. Departing from the conventional model, in this paper, we propose an alternative, called the analog-spectrum model, which more closely reflects the biochemical process. This novel modeling reestablishes probe length as the performance-governing factor, adopting "semidegenerate bases" as suitable emulators of currently inadequate universal bases. One important conclusion is that accurate biochemical measurements are pivotal to the success of SBH. The theoretical proposal presented in this paper should be a convincing stimulus for the needed biotechnological work.

[1]  D. Loakes,et al.  Survey and summary: The applications of universal DNA base analogues. , 2001, Nucleic acids research.

[2]  P. Pevzner 1-Tuple DNA sequencing: computer analysis. , 1989, Journal of biomolecular structure & dynamics.

[3]  Eli Upfal,et al.  Sequencing-by-Hybridization at the Information-Theory Bound: An Optimal Algorithm , 2000, J. Comput. Biol..

[4]  Alan M. Frieze,et al.  On the power of universal bases in sequencing by hybridization , 1999, RECOMB.

[5]  Eli Upfal,et al.  Sequence Reconstruction from Nucleic Acid Microarray Data , 2005 .

[6]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[7]  J. SantaLucia,et al.  Nearest neighbor thermodynamic parameters for internal G.A mismatches in DNA. , 1998, Biochemistry.

[8]  Craig A. Stewart,et al.  Introduction to computational biology , 2005 .

[9]  J. SantaLucia,et al.  Thermodynamics and NMR of internal G.T mismatches in DNA. , 1997, Biochemistry.

[10]  David Martin,et al.  Computational Molecular Biology: An Algorithmic Approach , 2001 .

[11]  Franco P. Preparata,et al.  DNA Sequencing by Hybridization Using Semi-Degenerate Bases , 2004, J. Comput. Biol..

[12]  J. SantaLucia,et al.  Nearest-neighbor thermodynamics of internal A.C mismatches in DNA: sequence dependence and pH effects. , 1998, Biochemistry.

[13]  P. Pevzner,et al.  Improved chips for sequencing by hybridization. , 1991, Journal of biomolecular structure & dynamics.

[14]  Martin E. Dyer,et al.  The Probability of Unique Solutions of Sequencing by Hybridization , 1994, J. Comput. Biol..

[15]  J. SantaLucia,et al.  Thermodynamics of internal C.T mismatches in DNA. , 1998, Nucleic acids research.

[16]  W. Bains,et al.  A novel method for nucleic acid sequence determination. , 1988, Journal of theoretical biology.

[17]  J. SantaLucia,et al.  Nearest-neighbor thermodynamics and NMR of DNA sequences with internal A.A, C.C, G.G, and T.T mismatches. , 1999, Biochemistry.

[18]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[19]  R. Drmanac,et al.  Sequencing of megabase plus DNA by hybridization: theory of the method. , 1989, Genomics.