A New Algorithm for DNA Sequence Assembly

Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pairwise fragment overlap. While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hybridization (SBH), infers a DNA sequence given the set of oligomers that represents all subwords of some fixed length, k. In this paper, we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises to be very fast and practical for DNA sequence assembly.

[1]  L. Hood,et al.  DNA sequence determination by hybridization: a strategy for efficient large-scale sequencing. , 1993, Science.

[2]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[3]  David J. Lipman,et al.  MULTIPLE ALIGNMENT , COMMUNICATION COST , AND GRAPH MATCHING * , 1992 .

[4]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Mark M. Meerschaert,et al.  Mathematical Modeling , 2014, Encyclopedia of Social Network Analysis and Mining.

[6]  W. Bains,et al.  A novel method for nucleic acid sequence determination. , 1988, Journal of theoretical biology.

[7]  David Maier,et al.  On Finding Minimal Length Superstrings , 1980, J. Comput. Syst. Sci..

[8]  J. Kececioglu Exact and approximation algorithms for DNA sequence reconstruction , 1992 .

[9]  I Tinoco Genetics and violent crime. , 1994, Science.

[10]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[11]  R. Staden A new computer method for the storage and manipulation of DNA gel reading data. , 1980, Nucleic acids research.

[12]  P. Green,et al.  Sequence of human glucose-6-phosphate dehydrogenase cloned in plasmids and a yeast artificial chromosome. , 1991, Genomics.

[13]  J. Gallant The complexity of the overlap method for sequencing biopolymers. , 1983, Journal of theoretical biology.

[14]  J. P. Dumas,et al.  Efficient algorithms for folding and comparing nucleic acid sequences , 1982, Nucleic Acids Res..

[15]  C. Burks,et al.  Artificially generated data sets for testing DNA sequence assembly algorithms. , 1993, Genomics.

[16]  X. Huang,et al.  A contig assembly program based on sensitive detection of fragment overlaps. , 1992, Genomics.

[17]  Remo Guidieri Res , 1995, RES: Anthropology and Aesthetics.

[18]  Hans Söderlund,et al.  SEQAID: a DNA sequence assembling program based on a mathematical model , 1984, Nucleic Acids Res..

[19]  P. Pevzner 1-Tuple DNA sequencing: computer analysis. , 1989, Journal of biomolecular structure & dynamics.

[20]  P. Pevzner Multiple alignment, communication cost, and graph matching , 1992 .

[21]  Lloyd M. Smith,et al.  Fluorescence detection in automated DNA sequence analysis , 1986, Nature.

[22]  D Gusfield,et al.  Efficient methods for multiple sequence alignment with guaranteed error bounds , 1993, Bulletin of mathematical biology.

[23]  Christian Burks,et al.  GenFrag 2.1: new features for more robust fragment assembly benchmarks , 1994, Comput. Appl. Biosci..

[24]  E. D. Hyman A new method of sequencing DNA. , 1988, Analytical biochemistry.