The Principles of Shotgun Sequencing and Automated Fragment Assembly

[1]  S. Benzer ON THE TOPOLOGY OF THE GENETIC FINE STRUCTURE. , 1959, Proceedings of the National Academy of Sciences of the United States of America.

[2]  M O Dayhoff Computer aids to protein sequence determination. , 1965, Journal of theoretical biology.

[3]  J. T. Madison,et al.  Structure of a Ribonucleic Acid , 1965, Science.

[4]  C R Merril,et al.  Reconstruction of protein and nucleic acid sequences: alamine transfer ribonucleic acid. , 1965, Science.

[5]  C R Merril,et al.  Reconstruction of protein and nucleic acid sequences. IV. The algebra of free monoids and the fragmentation stratagem. , 1966, The Bulletin of mathematical biophysics.

[6]  Marvin B. Shapiro An Algorithm for Reconstructing Protein and RNA Sequences , 1967, JACM.

[7]  G. Hutchinson,et al.  Evaluation of polymer sequence fragment data using graph theory. , 1969, The Bulletin of mathematical biophysics.

[8]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[9]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[10]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[11]  John E. Hopcroft,et al.  Complexity of Computer Computations , 1974, IFIP Congress.

[12]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1974, Nature.

[13]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[14]  John Carbon,et al.  A colony bank containing synthetic CoI EI hybrid plasmids representative of the entire E. coli genome , 1976, Cell.

[15]  W. Gilbert,et al.  A new method for sequencing DNA. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[16]  R. Staden Sequence data handling by computer. , 1977, Nucleic acids research.

[17]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[18]  R. Staden,et al.  Nucleotide sequence of bacteriophage G4 DNA , 1978, Nature.

[19]  J. Messing,et al.  Methylation of single-stranded DNA in vitro introduces new restriction endonuclease cleavage sites , 1978, Nature.

[20]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[21]  R. Staden Further procedures for sequence analysis by computer. , 1978, Nucleic acids research.

[22]  T. Gingeras,et al.  Computer programs for the assembly of DNA sequences. , 1979, Nucleic acids research.

[23]  R. Staden A strategy of DNA sequencing employing computer programs. , 1979, Nucleic acids research.

[24]  Shimon Even,et al.  Graph Algorithms , 1979 .

[25]  R. Polozov,et al.  On the algorithms for determining the primary structure of biopolymers , 1979 .

[26]  Peter H. Sellers,et al.  The Theory and Computation of Evolutionary Distances: Pattern Recognition , 1980, J. Algorithms.

[27]  David Maier,et al.  On Finding Minimal Length Superstrings , 1980, J. Comput. Syst. Sci..

[28]  R. Staden A new computer method for the storage and manipulation of DNA gel reading data. , 1980, Nucleic acids research.

[29]  S. Anderson,et al.  Shotgun DNA sequencing using cloned DNase I-generated fragments , 1981, Nucleic Acids Res..

[30]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[31]  F. Sanger,et al.  Sequence and organization of the human mitochondrial genome , 1981, Nature.

[32]  J Messing,et al.  A system for shotgun DNA sequencing. , 1981, Nucleic acids research.

[33]  E. Geiduschek,et al.  Analysis of transcription of the human Alu family ubiquitous repeating element by eukaryotic RNA polymerase III. , 1981, Nucleic acids research.

[34]  R. Staden Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. , 1982, Nucleic acids research.

[35]  R Staden,et al.  An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences. , 1982, Nucleic acids research.

[36]  Hans Söderlund,et al.  Algorithms for Some String Matching Problems Arising in Molecular Genetics , 1983, IFIP Congress.

[37]  P. Deininger,et al.  Approaches to rapid DNA sequence analysis. , 1983, Analytical biochemistry.

[38]  J. Gallant The complexity of the overlap method for sequencing biopolymers. , 1983, Journal of theoretical biology.

[39]  P. L. Deininger,et al.  DNA sequence and expression of the B95-8 Epstein—Barr virus genome , 1984, Nature.

[40]  Hans Söderlund,et al.  SEQAID: a DNA sequence assembling program based on a mathematical model , 1984, Nucleic Acids Res..

[41]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[42]  Lloyd M. Smith,et al.  Fluorescence detection in automated DNA sequence analysis , 1986, Nature.

[43]  L. Hood,et al.  Automated DNA sequencing and analysis of the human genome. , 1987, Genomics.

[44]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[45]  Esko Ukkonen,et al.  A Greedy Approximation Algorithm for Constructing Shortest Common Superstrings , 1988, Theor. Comput. Sci..

[46]  M. Karas,et al.  Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. , 1988, Analytical chemistry.

[47]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Jonathan S. Turner,et al.  Approximation Algorithms for the Shortest Common Superstring Problem , 1989, Inf. Comput..

[49]  R. Drmanac,et al.  Sequencing of megabase plus DNA by hybridization: theory of the method. , 1989, Genomics.

[50]  P. Pevzner 1-Tuple DNA sequencing: computer analysis. , 1989, Journal of biomolecular structure & dynamics.

[51]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[52]  S. Antonarakis The mapping and sequencing of the human genome. , 1990, Southern medical journal.

[53]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[54]  J. D. Watson The human genome project: past, present, and future. , 1990, Science.

[55]  J R Thompson,et al.  Scanning tunneling microscopy and spectroscopy of plasmid DNA. , 1990, Scanning microscopy.

[56]  Peter C. Cheeseman,et al.  Where the Really Hard Problems Are , 1991, IJCAI.

[57]  C. Caskey,et al.  Closure strategies for random DNA sequencing , 1991 .

[58]  Tao Jiang,et al.  Linear approximation of shortest superstrings , 1991, STOC '91.

[59]  R. Staden,et al.  A sequence assembly and editing program for efficient management of large projects. , 1991, Nucleic acids research.

[60]  T. Hunkapiller,et al.  Sequence accuracy of large DNA sequencing projects. , 1992, DNA sequence : the journal of DNA sequencing and mapping.

[61]  J. Kececioglu Exact and approximation algorithms for DNA sequence reconstruction , 1992 .

[62]  M. Waterman,et al.  The accuracy of DNA sequences: estimating sequence quality. , 1992, Genomics.

[63]  Esko Ukkonen,et al.  Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[64]  X. Huang,et al.  A contig assembly program based on sensitive detection of fragment overlaps. , 1992, Genomics.

[65]  Kun-Mao Chao,et al.  Aligning two sequences within a specified diagonal band , 1992, Comput. Appl. Biosci..

[66]  F Khurshid,et al.  Error analysis in manual and automated DNA sequencing. , 1993, Analytical biochemistry.

[67]  L. M. Smith,et al.  An adaptive, object oriented strategy for base calling in DNA sequence analysis. , 1993, Nucleic acids research.

[68]  James B. Golden,et al.  Pattern Recognition for Automated DNA Sequencing: I. On-Line Signal Conditioning and Feature Extraction for Basecalling , 1993, ISMB.

[69]  D. Schlessinger,et al.  Ordered shotgun sequencing, a strategy for integrated mapping and sequencing of YAC clones. , 1993, Genomics.

[70]  C. Tibbetts,et al.  Neural Networks for Automated Base-calling of Gel-based DNA Sequencing Ladders , 1994 .

[71]  V. Solovyev,et al.  Assignment of position-specific error probability to primary DNA sequence data. , 1994, Nucleic acids research.

[72]  G. Evans,et al.  Genomic sequence sampling: a strategy for high resolution sequence–based physical mapping of complex genomes , 1994, Nature Genetics.

[73]  Clifford Stein,et al.  Long tours and short superstrings , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[74]  G. Hartzell,et al.  DNA sequence confidence estimation. , 1994, Genomics.

[75]  Mark J. Miller,et al.  A Quantitative Comparison of DNA Sequence Assembly Programs , 1994, J. Comput. Biol..

[76]  J Quackenbush,et al.  Physical mapping of complex genomes by sampled sequencing: a theoretical analysis. , 1995, Genomics.

[77]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[78]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[79]  R Staden,et al.  The application of numerical estimates of base calling accuracy to DNA sequencing projects. , 1995, Nucleic acids research.

[80]  J. Bonfield,et al.  A new DNA sequence assembly program. , 1995, Nucleic acids research.

[81]  Eugene W. Myers,et al.  Toward Simplifying and Accurately Formulating Fragment Assembly , 1995, J. Comput. Biol..

[82]  J. Roach,et al.  Pairwise end sequencing: a unified approach to genomic mapping and sequencing. , 1995, Genomics.

[83]  G A Buck,et al.  Accuracy of automated DNA sequencing: a multi-laboratory comparison of sequencing results. , 1995, BioTechniques.

[84]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[85]  X. Huang,et al.  An improved sequence assembly program. , 1996, Genomics.

[86]  J. Craig Venter,et al.  A new strategy for genome sequencing , 1996, Nature.

[87]  Clifford Stein,et al.  A 2 2 3 {approximation Algorithm for the Shortest Superstring Problem , 1995 .

[88]  Elizabeth Sweedyk A 2 1 2 approximation in algorithm for shortest common superstring , 1996 .

[89]  D. Hartl,et al.  Sequence scanning: A method for rapid sequence acquisition from large-fragment DNA clones. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[90]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[91]  H R Garner,et al.  PRIMO: A primer design program that applies base quality statistics for automated large-scale DNA sequencing. , 1997, Genomics.

[92]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[93]  Sophie Schbath,et al.  Coverage Processes in Physical Mapping by Anchoring Random Clones , 1997, J. Comput. Biol..

[94]  Steven Skiena,et al.  Trie-Based Data Structures for Sequence Assembly , 1997, CPM.

[95]  J. Weber,et al.  Human whole-genome shotgun sequencing. , 1997, Genome research.

[96]  P. Green,et al.  Against a whole-genome shotgun. , 1997, Genome research.

[97]  Eugene W. Myers,et al.  ReAligner: a program for refining DNA sequence multi-alignments , 1997, RECOMB '97.

[98]  The Sanger Centre Toward a complete human genome sequence. , 1998, Genome research.

[99]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[100]  M. Westphall,et al.  A software system for data analysis in automated DNA sequencing. , 1998, Genome research.

[101]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[102]  P. Richterich,et al.  Estimation of errors in "raw" DNA sequences: a validation study. , 1998, Genome research.

[103]  E. Eichler,et al.  Masquerading repeats: paralogous pitfalls of the human genome. , 1998, Genome research.

[104]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[105]  M. Ronaghi,et al.  A Sequencing Method Based on Real-Time Pyrophosphate , 1998, Science.

[106]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[107]  B. Berger,et al.  Sequencing a genome by walking with clone-end sequences: a mathematical analysis. , 1999 .

[108]  E. Marshall A High-Stakes Gamble on Genome Sequencing , 1999, Science.

[109]  Eugene W. Myers,et al.  Algorithms for whole genome shotgun sequencing , 1999, RECOMB.

[110]  S. Kim,et al.  AMASS: A Structured Pattern Matching Approach to Shotgun Sequence Assembly , 1998, J. Comput. Biol..

[111]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[112]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[113]  R N Re,et al.  On the sequencing of the human genome. , 2000, Hypertension.

[114]  V. Thorsson,et al.  Parking strategies for genome sequencing. , 2000, Genome research.

[115]  E. Vermaas,et al.  In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[116]  Rithy K. Roth,et al.  Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays , 2000, Nature Biotechnology.

[117]  Steven Skiena,et al.  A case study in genome-level fragment assembly , 2000, Bioinform..

[118]  Nikos Kyrpides,et al.  Genomes OnLine Database (GOLD): a monitor of genome projects world-wide , 2001, Nucleic Acids Res..

[119]  D. Haussler,et al.  Assembly of the working draft of the human genome with GigAssembler. , 2001, Genome research.

[120]  L. Hillier,et al.  Theories and applications for sequencing randomly selected clones. , 2001, Genome research.

[121]  M Morris,et al.  Basecalling with LifeTrace. , 2001, Genome research.

[122]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[123]  Eugene W. Myers,et al.  The greedy path-merging algorithm for sequence assembly , 2001, RECOMB.

[124]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[125]  Eugene W. Myers,et al.  A sublinear algorithm for approximate keyword searching , 1994, Algorithmica.

[126]  Eugene W. Myers,et al.  Combinatorial algorithms for DNA sequence assembly , 1995, Algorithmica.

[127]  Craig A. Stewart,et al.  Introduction to computational biology , 2005 .

[128]  Pavel A. Pevzner,et al.  DNA physical mapping and alternating Eulerian cycles in colored graphs , 1995, Algorithmica.