CloneQC: lightweight sequence verification for synthetic biology

Synthetic biology projects aim to produce physical DNA that matches a designed target sequence. Chemically synthesized oligomers are generally used as the starting point for building larger and larger sequences. Due to the error rate of chemical synthesis, these oligomers can have many differences from the target sequence. As oligomers are joined together to make larger and larger synthetic intermediates, it becomes essential to perform quality control to eliminate intermediates with errors and retain only those DNA molecules that are error free with respect to the target. This step is often performed by transforming bacteria with synthetic DNA and sequencing colonies until a clone with a perfect sequence is identified. Here we present CloneQC, a lightweight software pipeline available as a free web server and as source code that performs quality control on sequenced clones. Input to the server is a list of desired sequences and forward and reverse reads for each clone. The server generates summary statistics (error rates and success rates target-by-target) and a detailed report of perfect clones. This software will be useful to laboratories conducting in-house DNA synthesis and is available at http://cloneqc.thruhere.net/ and as Berkeley Software Distribution (BSD) licensed source.

[1]  Jef D Boeke,et al.  Teaching Synthetic Biology, Bioinformatics and Engineering to Undergraduates: The Interdisciplinary Build-a-Genome Course , 2009, Genetics.

[2]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[3]  Joel S. Bader,et al.  GeneDesign 3.0 is an updated synthetic biology toolkit , 2010, Nucleic acids research.

[4]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[7]  Kathryn E. Richmond,et al.  Correcting errors in synthetic DNA through consensus shuffling , 2005, Nucleic acids research.

[8]  Peter A Carr,et al.  Protein-mediated error correction for de novo DNA synthesis. , 2004, Nucleic acids research.

[9]  J. Boeke,et al.  GeneDesign: rapid, automated design of multikilobase synthetic genes. , 2006, Genome research.

[10]  Jean Peccoud,et al.  Gene synthesis demystified. , 2009, Trends in biotechnology.

[11]  G. Church,et al.  Accurate multiplex gene synthesis from programmable DNA microchips , 2004, Nature.

[12]  Kathryn F. Beal,et al.  The Staden package, 1998. , 2000, Methods in molecular biology.