SePIA: RNA and small RNA sequence processing, integration, and analysis

BackgroundLarge-scale sequencing experiments are complex and require a wide spectrum of computational tools to extract and interpret relevant biological information. This is especially true in projects where individual processing and integrated analysis of both small RNA and complementary RNA data is needed. Such studies would benefit from a computational workflow that is easy to implement and standardizes the processing and analysis of both sequenced data types.ResultsWe developed SePIA (Sequence Processing, Integration, and Analysis), a comprehensive small RNA and RNA workflow. It provides ready execution for over 20 commonly known RNA-seq tools on top of an established workflow engine and provides dynamic pipeline architecture to manage, individually analyze, and integrate both small RNA and RNA data. Implementation with Docker makes SePIA portable and easy to run. We demonstrate the workflow’s extensive utility with two case studies involving three breast cancer datasets. SePIA is straightforward to configure and organizes results into a perusable HTML report. Furthermore, the underlying pipeline engine supports computational resource management for optimal performance.ConclusionSePIA is an open-source workflow introducing standardized processing and analysis of RNA and small RNA data. SePIA’s modular design enables robust customization to a given experiment while maintaining overall workflow structure. It is available at http://anduril.org/sepia.

[1]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[2]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[3]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[4]  Xujuan Yang,et al.  Estrogen receptor α inhibitor activates the unfolded protein response, blocks protein synthesis, and induces tumor regression , 2015, Proceedings of the National Academy of Sciences.

[5]  Chris Williams,et al.  RNA-SeQC: RNA-seq metrics for quality control and process optimization , 2012, Bioinform..

[6]  Ángel M. Alganza,et al.  sRNAbench: profiling of small RNAs and its sequence variants in single or multi-species high-throughput experiments , 2014 .

[7]  Hao Ye,et al.  Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine , 2015, Pharmaceutics.

[8]  Albert Kim,et al.  Detecting miRNAs in deep-sequencing data: a software performance comparison and evaluation , 2013, Briefings Bioinform..

[9]  Nectarios Koziris,et al.  DIANA-microT web server: elucidating microRNA functions through target prediction , 2009, Nucleic Acids Res..

[10]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[11]  Serge Faumont,et al.  Single-cell transcriptional analysis of taste sensory neuron pair in Caenorhabditis elegans , 2009, Nucleic acids research.

[12]  George A Calin,et al.  Prooncogenic factors miR-23b and miR-27b are regulated by Her2/Neu, EGF, and TNF-α in breast cancer. , 2013, Cancer research.

[13]  Sebastian D. Mackowiak,et al.  miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades , 2011, Nucleic acids research.

[14]  Michelle S. Scott,et al.  From snoRNA to miRNA: Dual function regulatory non-coding RNAs , 2011, Biochimie.

[15]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[16]  J Ragoussis,et al.  An oncogenic role of eIF3e/INT6 in human breast cancer , 2010, Oncogene.

[17]  Thomas D. Schmittgen,et al.  Tumor Suppressive Function of mir-205 in Breast Cancer Is Linked to HMGB3 Regulation , 2013, PloS one.

[18]  Yuntao Xie,et al.  Down‐regulation of the cavin family proteins in breast cancer , 2012, Journal of cellular biochemistry.

[19]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[20]  C. Elsik The pea aphid genome sequence brings theories of insect defense into question , 2010, Genome Biology.

[21]  Kenneth H. Buetow,et al.  Bioinformatics Applications Note Sequence Analysis Bambino: a Variant Detector and Alignment Viewer for Next-generation Sequencing Data in the Sam/bam Format , 2022 .

[22]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[23]  Laoighse Mulrane,et al.  miR-187 Is an Independent Prognostic Factor in Breast Cancer and Confers Increased Invasive Potential In Vitro , 2012, Clinical Cancer Research.

[24]  C. Croce,et al.  Integrated MicroRNA and mRNA Signatures Associated with Survival in Triple Negative Breast Cancer , 2013, PloS one.

[25]  Liliana Florea,et al.  Transcriptomic landscape of breast cancers through mRNA sequencing , 2012, Scientific Reports.

[26]  Jin Billy Li,et al.  Reliable identification of genomic variants from RNA-seq data. , 2013, American journal of human genetics.

[27]  L. Pachter,et al.  Streaming fragment assignment for real-time analysis of sequencing experiments , 2012, Nature Methods.

[28]  E. Furth,et al.  The myc-miR-17~92 axis blunts TGF{beta} signaling and production of multiple TGF{beta}-dependent antiangiogenic factors. , 2010, Cancer research.

[29]  N. Rajewsky,et al.  A human snoRNA with microRNA-like functions. , 2008, Molecular cell.

[30]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[31]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[32]  D. Noh,et al.  Ahnak functions as a tumor suppressor via modulation of TGFβ/Smad signaling pathway , 2014, Oncogene.

[33]  Antonio Rinaldi,et al.  iMir: An integrated pipeline for high-throughput analysis of small non-coding RNA data obtained by smallRNA-Seq , 2013, BMC Bioinformatics.

[34]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[35]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[36]  R. Mains,et al.  Global Approaches to the Role of miRNAs in Drug-Induced Changes in Gene Expression , 2012, Front. Gene..

[37]  J. Harrow,et al.  Systematic evaluation of spliced alignment programs for RNA-seq data , 2013, Nature Methods.

[38]  Antti Honkela,et al.  Fast and accurate approximate inference of transcript expression from RNA-seq data , 2014, Bioinform..

[39]  M N Aoki,et al.  Caveolin involvement and modulation in breast cancer. , 2011, Mini reviews in medicinal chemistry.

[40]  Wei Li,et al.  RSeQC: quality control of RNA-seq experiments , 2012, Bioinform..

[41]  K. Ovaska,et al.  Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme , 2010, Genome Medicine.

[42]  I. Rigoutsos,et al.  The miR-17/92 cluster: a comprehensive update on its genomics, genetics, functions and increasingly important and numerous roles in health and disease , 2013, Cell Death and Differentiation.

[43]  T. Cullen,et al.  Global existence of solutions for the relativistic Boltzmann equation on the flat Robertson-Walker space-time for arbitrarily large intial data , 2005, gr-qc/0507035.

[44]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[45]  D. Tollervey,et al.  Mapping the miRNA interactome by cross-linking ligation and sequencing of hybrids (CLASH) , 2014, Nature Protocols.

[46]  Stinus Lindgreen,et al.  AdapterRemoval: easy cleaning of next-generation sequencing reads , 2012, BMC Research Notes.

[47]  Michael Kertesz,et al.  The role of site accessibility in microRNA target recognition , 2007, Nature Genetics.

[48]  L. Zhang,et al.  Elevated expression of myosin X in tumours contributes to breast cancer aggressiveness and metastasis , 2014, British Journal of Cancer.

[49]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[50]  B. Frey,et al.  Using expression profiling data to identify human microRNA targets , 2007, Nature Methods.

[51]  Christopher A. Maher,et al.  ChimeraScan: a tool for identifying chimeric transcription in sequencing data , 2011, Bioinform..

[52]  Hsien-Da Huang,et al.  miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions , 2013, Nucleic Acids Res..

[53]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[54]  Angel Rubio,et al.  Joint analysis of miRNA and mRNA expression data , 2013, Briefings Bioinform..

[55]  K. Hansen,et al.  Biases in Illumina transcriptome sequencing caused by random hexamer priming , 2010, Nucleic acids research.

[56]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[57]  Kim W. Carter,et al.  Integrated Analysis of miRNA and mRNA Expression in Childhood Medulloblastoma Compared with Neural Stem Cells , 2011, PloS one.

[58]  O. Kallioniemi,et al.  FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data , 2014, bioRxiv.

[59]  Yanqing Wang,et al.  Bioinformatics Applications Note Databases and Ontologies Waprna: a Web-based Application for the Processing of Rna Sequences , 2022 .

[60]  Alvis Brazma,et al.  A pipeline for RNA-seq data processing and quality assessment , 2011, Bioinform..

[61]  Sampsa Hautaniemi,et al.  Anima: Modular Workflow System for Comprehensive Image Data Analysis , 2014, Front. Bioeng. Biotechnol..

[62]  R. Albulescu Elevated cyclin B2 expression in invasive breast carcinoma is associated with unfavorable clinical outcome. , 2013, Biomarkers in medicine.

[63]  R. Iozzo,et al.  Decorin suppresses tumor cell-mediated angiogenesis , 2002, Oncogene.

[64]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[65]  Ava Kwong,et al.  MicroRNA-143 is downregulated in breast cancer and regulates DNA methyltransferases 3A in breast cancer cells , 2014, Tumor Biology.

[66]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[67]  Navya Laxman,et al.  Global miRNA expression and correlation with mRNA levels in primary human bone cells , 2015, RNA.

[68]  Lin He,et al.  mir-17-92, a cluster of miRNAs in the midst of the cancer network. , 2010, The international journal of biochemistry & cell biology.

[69]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[70]  Yu-Ping Wang,et al.  Correlation of expression profiles between microRNAs and mRNA targets using NCI-60 data , 2009, BMC Genomics.

[71]  Jian Luo,et al.  GCIP/CCNDBP1, a helix–loop–helix protein, suppresses tumorigenesis , 2007, Journal of cellular biochemistry.

[72]  Ana M. Aransay,et al.  miRanalyzer: an update on the detection and analysis of microRNAs in high-throughput sequencing experiments , 2011, Nucleic Acids Res..

[73]  John D McPherson,et al.  Next-generation gap , 2009, Nature Methods.

[74]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[75]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[76]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[77]  Süleyman Cenk Sahinalp,et al.  deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data , 2011, PLoS Comput. Biol..

[78]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[79]  L. Lim,et al.  MicroRNA targeting specificity in mammals: determinants beyond seed pairing. , 2007, Molecular cell.