An algorithm for the DNA sequence generation from k-tuple word contents of the minimal number of random fragments.

An algorithm is described for generation of the long sequence written in a four letter alphabet from the constituent k-tuple words in the minimal number of separate, randomly defined fragments of the starting sequence. It is primarily intended for use in sequencing by hybridization (SBH) process- a potential method for sequencing human genome DNA (Drmanac et al., Genomics 4, pp. 114-128, 1989). The algorithm is based on the formerly defined rules and informative entities of the linear sequence. The algorithm requires neither knowledge on the number of appearances of a given k-tuple in sequence fragments, nor the information on which k-tuple words are on the ends of a fragment. It operates with the mixed content of k-tuples of the various lengths. The concept of the algorithm enables operations with the k-tuple sets containing false positive and false negative k-tuples. The content of the false k-tuples primarily affects the completeness of the generated sequence, and its correctness in the specific cases only. The algorithm can be used for the optimization of SBH parameters in the simulation experiments, as well as for the sequence generation in the real SBH experiments on the genomic DNA.

[1]  W. Gilbert,et al.  A new method for sequencing DNA. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[2]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[3]  A. Poustka,et al.  Molecular approaches to mammalian genetics. , 1986, Cold Spring Harbor symposia on quantitative biology.

[4]  W. Bains,et al.  A novel method for nucleic acid sequence determination. , 1988, Journal of theoretical biology.

[5]  K. Khrapko,et al.  An oligonucleotide hybridization approach to DNA sequencing , 1989, FEBS letters.

[6]  R. Drmanac,et al.  Sequencing of megabase plus DNA by hybridization: theory of the method. , 1989, Genomics.

[7]  P. Pevzner 1-Tuple DNA sequencing: computer analysis. , 1989, Journal of biomolecular structure & dynamics.

[8]  H. Erfle,et al.  Automated DNA sequencing of the human HPRT locus. , 1990, Genomics.

[9]  R. Drmanac,et al.  Reliable hybridization of oligonucleotides as short as six nucleotides. , 1990, DNA and cell biology.