Tools for Next Generation Sequencing Data Analysis

As NGS technology continues to improve, the amount of data generated per run grows exponentially. Unfortunately, the primary bottleneck in NGS studies is still bioinformatics analysis. Not all researchers have access to a bioinformatics core or dedicated bioinformatician. Additionally, much of the software for NGS analyses is written to run in a Unix / Linux environment. Researchers unfamiliar with the Unix command line may be unable to use these tools, or face a steep learning curve in trying to do so. Commercial packages exist, such as the CLC Genomics Workbench, DNANexus, and GenomeQuest. However, these commercial packages often incorporate proprietary algorithms to perform data analysis and may be costly. Galaxy provides a solution to this problem by incorporating popular open-source and community linux command line tools into an easy to use web-based environment. After sequence data has been uploaded and mapped, there are a variety of workflows for NGS analyses that use open-source tools. This includes peak-calling analyses for ChIP-Seq (MACS, GeneTrack indexer, Peak predictor), RNA-Seq (Tophat, Cufflinks), and finding small insertions, deletions, and SNPs using SAMtools. Any researcher can apply a workflow to his NGS data and retrieve results, without having to interact with a command line. Additionally, since Galaxy is cloud-based, expensive computing hardware for performing analyses is not needed. In this presentation we will provide an overview of two popular open source RNA-Seq analysis tools, Tophat and Cufflinks, and demonstrate how they can be used in Galaxy.