NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data

Next generation sequencing (NGS) technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC) of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis (statistics tools). A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis.

[1]  Mukesh Jain,et al.  Gene Discovery and Tissue-Specific Transcriptome Analysis in Chickpea with Massively Parallel Pyrosequencing and Web Resource Development1[W][OA] , 2011, Plant Physiology.

[2]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[3]  Akhilesh K. Tyagi,et al.  De Novo Assembly of Chickpea Transcriptome Using Short Reads for Gene Discovery and Marker Identification , 2011, DNA research : an international journal for rapid publication of reports on genes and genomes.

[4]  Patrick J. Biggs,et al.  SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data , 2010, BMC Bioinformatics.

[5]  Forest Rohwer,et al.  TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets , 2010, BMC Bioinformatics.

[6]  Anton Nekrutenko,et al.  Manipulation of FASTQ data with Galaxy , 2010, Bioinform..

[7]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[8]  Christian Schlötterer,et al.  CANGS: a user-friendly utility for processing and analyzing 454 GS-FLX data in biodiversity studies , 2010, BMC Research Notes.

[9]  Carsten O. Daub,et al.  TagDust—a program to eliminate artifacts from next generation sequencing data , 2009, Bioinform..

[10]  Robert Gentleman,et al.  ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data , 2009, Bioinform..

[11]  Yuriy Fofanov,et al.  PIQA: pipeline for Illumina G1 genome analyzer data quality assessment , 2009, Bioinform..

[12]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[13]  E. Mardis Next-generation DNA sequencing methods. , 2008, Annual review of genomics and human genetics.

[14]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.