Improving the Quality of Automatic DNA Sequence Assembly Using Fluorescent Trace-Data Classifications

Virtually all large-scale sequencing projects use automatic sequence-assembly programs to aid in the determination of DNA sequences. The computer-generated assemblies required substantial hand-editing to transform them into submissions for GenBank. As the size of sequencing projects increases, it becomes essential to improve the quality of the automated assemblies so that this time consuming hand-editing may be reduced. Current ABI sequencing technology uses base calls made from fluorescently-labeled DNA fragments run on gels. We present a new representation for the fluorescent trace data associated with individual base calls. This representation can be used before, during, and after fragment assembly to improve the quality of assemblies. We demonstrate one such use-end-trimming of sub-optimal data-that results in a significant improvement in the quality of subsequent assemblies.

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  W. Gilbert,et al.  A new method for sequencing DNA. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[3]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[4]  R. Staden A new computer method for the storage and manipulation of DNA gel reading data. , 1980, Nucleic acids research.

[5]  H. M. Martinez,et al.  An efficient method for finding repeats in molecular sequences , 1983, Nucleic Acids Res..

[6]  J. Kruskal An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules , 1983 .

[7]  H. F. Li,et al.  PATTERN RECOGNITION FOR AUTOMATED WIRE BONDING. , 1984 .

[8]  W. Ansorge,et al.  A non-radioactive automated method for DNA sequence determination. , 1986, Journal of biochemical and biophysical methods.

[9]  Lloyd M. Smith,et al.  Fluorescence detection in automated DNA sequence analysis , 1986, Nature.

[10]  J. M. Prober,et al.  A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. , 1987, Science.

[11]  V. Rich Personal communication , 1989, Nature.

[12]  R. Staden,et al.  A sequence assembly and editing program for efficient management of large projects. , 1991, Nucleic acids research.

[13]  L. Hood,et al.  Large-scale and automated DNA sequence determination. , 1991, Science.

[14]  James B. Golden,et al.  Pattern Recognition for Automated DNA Sequencing: I. On-Line Signal Conditioning and Feature Extraction for Basecalling , 1993, ISMB.

[15]  M. Hattori [Automated DNA sequencer in genome analysis]. , 1993, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme.

[16]  L. Hood,et al.  An experimentally derived data set constructed for testing large-scale DNA sequence assembly algorithms. , 1993, Genomics.

[17]  W. McCombie,et al.  CHAPTER TWENTY-FOUR – Large-scale, Automated Sequencing of Human Chromosomal Regions , 1994 .

[18]  C. Burks,et al.  CHAPTER THIRTY-FOUR – Stochastic Optimization Tools for Genomic Sequence Assembly , 1994 .

[19]  C. Tibbetts,et al.  Neural Networks for Automated Base-calling of Gel-based DNA Sequencing Ladders , 1994 .

[20]  J. M. Kelley CHAPTER TWENTY-SIX – Automated Dye-Terminator DNA Sequencing , 1994 .

[21]  L. Rowen,et al.  CHAPTER TWENTY-FIVE – Zen and the Art of Large-scale Genomic Sequencing , 1994 .

[22]  Eugene W. Myers,et al.  CHAPTER THIRTY-TWO – Advances in Sequence Assembly , 1994 .

[23]  E. Y. Chen CHAPTER ONE – The Efficiency of Automated DNA Sequencing , 1994 .

[24]  S. Teoh,et al.  A New Method for In-Vitro Wear Assessment of Materials Used in Mechanical Heart Valves , 1994 .

[25]  Stanley Zarowin The New Computer , 1996 .

[26]  J. Badge DNA sequencing. , 1998, Methods in molecular biology.