TagGD: Fast and Accurate Software for DNA Tag Generation and Demultiplexing

Multiplexing is of vital importance for utilizing the full potential of next generation sequencing technologies. We here report TagGD (DNA-based Tag Generator and Demultiplexor), a fully-customisable, fast and accurate software package that can generate thousands of barcodes satisfying user-defined constraints and can guarantee full demultiplexing accuracy. The barcodes are designed to minimise their interference with the experiment. Insertion, deletion and substitution events are considered when designing and demultiplexing barcodes. 20,000 barcodes of length 18 were designed in 5 minutes and 2 million barcoded Illumina HiSeq-like reads generated with an error rate of 2% were demultiplexed with full accuracy in 5 minutes. We believe that our software meets a central demand in the current high-throughput biology and can be utilised in any field with ample sample abundance. The software is available on GitHub (https://github.com/pelinakan/UBD.git).

[1]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[2]  David R. Butenhof Programming with POSIX threads , 1993 .

[3]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[4]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[5]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[6]  R. Knight,et al.  Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex , 2008, Nature Methods.

[7]  G. Hannon,et al.  DNA Sudoku--harnessing high-throughput sequencing for multiplexed specimen analysis. , 2009, Genome research.

[8]  Qikai Xu,et al.  Design of 240,000 orthogonal 25mer DNA barcode probes , 2009, Proceedings of the National Academy of Sciences.

[9]  Daniel N. Frank,et al.  BARCRAWL and BARTAB: software tools for the design and implementation of barcoded primers for highly multiplexed DNA sequencing , 2009, BMC Bioinformatics.

[10]  Hsien-Da Huang,et al.  Clusters of Nucleotide Substitutions and Insertion/Deletion Mutations Are Associated with Repeat Sequences , 2011, PLoS biology.

[11]  Joseph B Hiatt,et al.  Massively parallel functional dissection of mammalian enhancers in vivo , 2012, Nature Biotechnology.

[12]  Patrik L. Ståhl,et al.  Toward the single-hour high-quality genome. , 2012, Annual review of biochemistry.

[13]  Leonid V. Bystrykh,et al.  Generalized DNA Barcode Design Based on Hamming Codes , 2012, PloS one.