AdapterRemoval v2: rapid adapter trimming, identification, and read merging

BackgroundAs high-throughput sequencing platforms produce longer and longer reads, sequences generated from short inserts, such as those obtained from fossil and degraded material, are increasingly expected to contain adapter sequences. Efficient adapter trimming algorithms are also needed to process the growing amount of data generated per sequencing run.FindingsWe introduce AdapterRemoval v2, a major revision of AdapterRemoval v1, which introduces (i) striking improvements in throughput, through the use of single instruction, multiple data (SIMD; SSE1 and SSE2) instructions and multi-threading support, (ii) the ability to handle datasets containing reads or read-pairs with different adapters or adapter pairs, (iii) simultaneous demultiplexing and adapter trimming, (iv) the ability to reconstruct adapter sequences from paired-end reads for poorly documented data sets, and (v) native gzip and bzip2 support.ConclusionsWe show that AdapterRemoval v2 compares favorably with existing tools, while offering superior throughput to most alternatives examined here, both for single and multi-threaded operations.

[1]  Jiajie Zhang,et al.  PEAR: a fast and accurate Illumina Paired-End reAd mergeR , 2013, Bioinform..

[2]  Shuifang Zhu,et al.  Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads , 2014, BMC Bioinformatics.

[3]  Antonis Rokas,et al.  Prevention, diagnosis and treatment of high‐throughput sequencing data pathologies , 2014, Molecular ecology.

[4]  L. Orlando,et al.  Reconstructing ancient genomes and epigenomes , 2015, Nature Reviews Genetics.

[5]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[6]  Erik Aronesty,et al.  Comparison of Sequencing Utility Programs , 2013 .

[7]  Janet Kelso,et al.  leeHom: adaptor trimming and merging for Illumina sequencing reads , 2014, Nucleic acids research.

[8]  Jui-Hung Hung,et al.  PEAT: an intelligent and efficient paired-end sequencing adapter trimming algorithm , 2015, BMC Bioinformatics.

[9]  Anton J. Enright,et al.  Kraken: A set of tools for quality control and analysis of high-throughput sequence data , 2013, Methods.

[10]  Zhen Yue,et al.  pIRS: Profile-based Illumina pair-end reads simulator , 2012, Bioinform..

[11]  Steven Salzberg,et al.  BIOINFORMATICS ORIGINAL PAPER , 2004 .

[12]  Stinus Lindgreen,et al.  AdapterRemoval: easy cleaning of next-generation sequencing reads , 2012, BMC Research Notes.

[13]  S. Brisse,et al.  AlienTrimmer: a tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads. , 2013, Genomics.

[14]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[15]  Aurélien Ginolhac,et al.  Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX , 2014, Nature Protocols.

[16]  Siu-Ming Yiu,et al.  COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly , 2012, Bioinform..

[17]  Martin Kircher,et al.  Analysis of high-throughput ancient DNA sequencing data. , 2012, Methods in molecular biology.