A software system for data analysis in automated DNA sequencing.

Software for gel image analysis and base-calling in fluorescence-based sequencing consisting of two primary programs, BaseFinder and GelImager, is described. BaseFinder is a framework for trace processing, analysis, and base-calling. BaseFinder is highly extensible, allowing the addition of trace analysis and processing modules without recompilation. Powerful scripting capabilities combined with modularity and multilane handling allow the user to customize BaseFinder to virtually any type of trace processing. We have developed an extensive set of data processing and analysis modules for use with the program in fluorescence-based sequencing. GelImager is a framework for gel image manipulation. It can be used for gel visualization, lane retracking, and as a front end to the Washington University Getlanes program. The programs were designed using a cross-platform development environment, currently allowing them to run in Windows NT, Windows 95, Openstep/Mach, and Rhapsody. Work is ongoing to deploy the software on additional platforms, including Solaris, Linux, and MacOS. This software has been thoroughly tested and debugged in the analysis of >2 million bp of raw sequence data from human chromosome 19 region q13. Overall sequencing accuracy was measured using a significant subset of these data, consisting of approximately 600 sequences, by comparing the individual shotgun sequences against the final assembled contigs. Also, results are reported from experiments that analyzed the accuracy of the software and two other well-known base-calling programs for sequencing the M13mp18 vector sequence. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF025422]

[1]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[2]  Joel M. Harris,et al.  Selection of analytical wavelengths for multicomponent spectrophotometric determinations , 1985 .

[3]  Lloyd M. Smith,et al.  Mapping and Sequencing the Human Genome: How to Proceed , 1987, Bio/Technology.

[4]  L. Hood,et al.  The synthesis and use of fluorescent oligonucleotides in DNA sequence analysis. , 1987, Methods in enzymology.

[5]  L. M. Smith,et al.  High speed DNA sequencing by capillary electrophoresis. , 1990, Nucleic acids research.

[6]  L. Hood,et al.  Development of an automated procedure for fluorescent DNA sequencing. , 1990, Genomics.

[7]  J. Calvin Giddings,et al.  Unified Separation Science , 1991 .

[8]  L. Hood,et al.  Large-scale and automated DNA sequence determination. , 1991, Science.

[9]  Richard A. Mathies,et al.  Capillary array electrophoresis: an approach to high-speed, high-throughput DNA sequencing , 1992, Nature.

[10]  R. Staden,et al.  A standard file format for data from DNA sequencing instruments. , 1992, DNA sequence : the journal of DNA sequencing and mapping.

[11]  Lloyd M. Smith,et al.  High Speed Automated DNA Sequencing in Ultrathin Slab Gels , 1992, Bio/Technology.

[12]  L M Smith Automated DNA sequencing: a look into the future. , 1993, Cancer detection and prevention.

[13]  L. M. Smith,et al.  An adaptive, object oriented strategy for base calling in DNA sequence analysis. , 1993, Nucleic acids research.

[14]  James B. Golden,et al.  Pattern Recognition for Automated DNA Sequencing: I. On-Line Signal Conditioning and Feature Extraction for Basecalling , 1993, ISMB.

[15]  J. Pánek,et al.  Quantitative analysis of gel electrophoretograms by image analysis and least squares modeling , 1993, Electrophoresis.

[16]  T. Stockham,et al.  An automated film reader for DNA sequencing based on homomorphic deconvolution , 1994, IEEE Transactions on Biomedical Engineering.

[17]  K. Barbee,et al.  Deconvolution of gel filtration chromatographs of human plasma lipoproteins. , 1995, Analytical biochemistry.

[18]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[19]  M. Westphall,et al.  Automatic matrix determination in four dye fluorescence‐based DNA sequencing , 1996, Electrophoresis.

[20]  D R Maffitt,et al.  Lane tracking software for four-color fluorescence-based electrophoretic gel images. , 1996, Genome research.

[21]  Peter A. Jansson Modern constrained nonlinear methods , 1996 .

[22]  A. Berno A graph theoretic approach to the analysis of DNA sequencing data. , 1996, Genome research.

[23]  J. Berka,et al.  Rapid DNA sequencing of more than 1000 bases per run by capillary electrophoresis using replaceable linear polyacrylamide solutions. , 1996, Analytical chemistry.

[24]  L. J. Thomas,et al.  A method to determine the filter matrix in four‐dye fluorescence‐based DNA sequencing , 1997, Electrophoresis.

[25]  H. Swerdlow,et al.  Fully automated DNA reaction and analysis in a fluidic capillary instrument. , 1997, Analytical chemistry.

[26]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[27]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.