VGA: A method for viral quasispecies assembly from ultra-deep sequencing data

We present VGA, an accurate method for viral quasispecies assembly from ultra-deep sequencing data. The proposed method consists of a high-fidelity sequencing protocol and an accurate method for viral quasispecies assembly, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Results on both synthetic and real datasets show that our method able to accurately assemble HIV viral quasispecies and detect rare quasispecies previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for the viral assembly. Furthermore, our method is the first viral assembly method which scales to millions of sequencing reads. Our tool VGA is freely available at http://genetics.cs.ucla.edu/vga/