CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing

Next-generation sequencing (NGS) has been a widely-used technology in biomedical research for understanding the role of molecular genetics of cells in health and disease. A variety of computational tools have been developed to analyse the vastly growing NGS data, which often require bioinformatics skills, tedious work and a significant amount of time. To facilitate data processing steps minding the gap between biologists and bioinformaticians, we developed CSI NGS Portal, an online platform which gathers established bioinformatics pipelines to provide fully automated NGS data analysis and sharing in a user-friendly website. The portal currently provides 16 standard pipelines for analysing data from DNA, RNA, smallRNA, ChIP, RIP, 4C, SHAPE, circRNA, eCLIP, Bisulfite and scRNA sequencing, and is flexible to expand with new pipelines. The users can upload raw data in FASTQ format and submit jobs in a few clicks, and the results will be self-accessible via the portal to view/download/share in real-time. The output can be readily used as the final report or as input for other tools depending on the pipeline. Overall, CSI NGS Portal helps researchers rapidly analyse their NGS data and share results with colleagues without the aid of a bioinformatician. The portal is freely available at: https://csibioinfo.nus.edu.sg/csingsportal.

[1]  Onur Yukselen,et al.  DolphinNext: a distributed data processing platform for high throughput genomics , 2019, BMC Genomics.

[2]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[3]  Yuanyuan Song,et al.  BioQueue: a novel pipeline framework to accelerate bioinformatics analysis , 2017, Bioinform..

[4]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[5]  Julia Casado,et al.  Anduril 2: upgraded large-scale data integration framework , 2019, Bioinform..

[6]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[7]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[8]  Kazuho Ikeo,et al.  Maser: one-stop platform for NGS big data from analysis to visualization , 2018, Database J. Biol. Databases Curation.

[9]  E. Levanon,et al.  Genome-wide quantification of ADAR adenosine-to-inosine RNA editing activity , 2019, Nature Methods.

[10]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[11]  Jacques van Helden,et al.  Sequanix: a dynamic graphical interface for Snakemake workflows , 2017, bioRxiv.

[12]  Gene W. Yeo,et al.  Robust transcriptome-wide discovery of RNA binding protein binding sites with enhanced CLIP (eCLIP) , 2016, Nature Methods.

[13]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[14]  Gene W. Yeo,et al.  Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges , 2013, Nature Structural &Molecular Biology.

[15]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[16]  Peter F. Stadler,et al.  RNA folding with hard and soft constraints , 2016, Algorithms for Molecular Biology.

[17]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[18]  Andreas E. Posch,et al.  Web-based NGS data analysis using miRMaster: a large-scale meta-analysis of human miRNAs , 2017, Nucleic acids research.

[19]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[21]  Lan Lin,et al.  rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data , 2014, Proceedings of the National Academy of Sciences.

[22]  J. Leek,et al.  regionReport: Interactive reports for region-based analyses , 2015, bioRxiv.

[23]  Ting Wang,et al.  Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser , 2013, Bioinform..

[24]  Stéphane Le Crom,et al.  Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses , 2012, Bioinform..

[25]  Howard Y. Chang,et al.  Transcriptome-wide interrogation of RNA secondary structure in living cells with icSHAPE , 2016, Nature Protocols.

[26]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[27]  Mattia D'Antonio,et al.  RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application , 2015, BMC Genomics.

[28]  Yoshihide Hayashizaki,et al.  Interactive visualization and analysis of large-scale sequencing datasets using ZENBU , 2014, Nature Biotechnology.

[29]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[30]  Eric L. Van Nostrand,et al.  Robust, Cost-Effective Profiling of RNA Binding Protein Targets with Single-end Enhanced Crosslinking and Immunoprecipitation (seCLIP). , 2017, Methods in molecular biology.

[31]  Kevin C. Dorff,et al.  GobyWeb: Simplified Management and Analysis of Gene Expression and DNA Methylation Sequencing Data , 2013, PloS one.

[32]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[33]  Qing-Yu He,et al.  DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis , 2015, Bioinform..

[34]  Jin Billy Li,et al.  Edinburgh Research Explorer Identifying Rna Editing Sites Using Rna Sequencing Data Alone , 2022 .

[35]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[36]  Helene Kretzmer,et al.  metilene: fast and sensitive calling of differentially methylated regions from bisulfite sequencing data , 2016, Genome research.

[37]  Bin Yu,et al.  Superheat: An R Package for Creating Beautiful and Extendable Heatmaps for Visualizing Complex Data , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[38]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[39]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[40]  Supat Thongjuea,et al.  r 3 Cseq : an R / Bioconductor package for the discovery of long-range genomic interactions from chromosome conformation capture and next-generation sequencing data , 2013 .

[41]  Eric Nestler,et al.  ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases , 2014, BMC Genomics.

[42]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[43]  Boris Lenhard,et al.  r3Cseq: an R/Bioconductor package for the discovery of long-range genomic interactions from chromosome conformation capture and next-generation sequencing data , 2013, Nucleic acids research.

[44]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[45]  Fidel Ramírez,et al.  deepTools2: a next generation web server for deep-sequencing data analysis , 2016, Nucleic Acids Res..

[46]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[47]  Guangchuang Yu,et al.  ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. , 2016, Molecular bioSystems.

[48]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[49]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.