glbase: a framework for combining, analyzing and displaying heterogeneous genomic and high-throughput sequencing data

Genomic datasets and the tools to analyze them have proliferated at an astonishing rate. However, such tools are often poorly integrated with each other: each program typically produces its own custom output in a variety of non-standard file formats. Here we present glbase, a framework that uses a flexible set of descriptors that can quickly parse non-binary data files. glbase includes many functions to intersect two lists of data, including operations on genomic interval data and support for the efficient random access to huge genomic data files. Many glbase functions can produce graphical outputs, including scatter plots, heatmaps, boxplots and other common analytical displays of high-throughput data such as RNA-seq, ChIP-seq and microarray expression data. glbase is designed to rapidly bring biological data into a Python-based analytical environment to facilitate analysis and data processing. In summary, glbase is a flexible and multifunctional toolkit that allows the combination and analysis of high-throughput data (especially next-generation sequencing and genome-wide data), and which has been instrumental in the analysis of complex data sets. glbase is freely available at http://bitbucket.org/oaxiom/glbase/.

[1]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[2]  Patrick J. Killion,et al.  ArrayPlex: distributed, interactive and programmatic access to genome sequence, annotation, ontology, and analytical toolsets , 2008, Genome Biology.

[3]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[4]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[5]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[6]  Brent S. Pedersen,et al.  CruzDB: software for annotation of genomic intervals with UCSC genome-browser database , 2013, Bioinform..

[7]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[8]  Finn Drabløs,et al.  The Genomic HyperBrowser: an analysis web server for genome-scale data , 2013, Nucleic Acids Res..

[9]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[10]  Diego Miranda-Saavedra,et al.  Distinct transcriptional regulatory modules underlie STAT3’s cell type-independent and cell type-specific functions , 2013, Nucleic acids research.

[11]  Sarah A. Teichmann,et al.  DBD––taxonomically broad transcription factor predictions: new content and functionality , 2007, Nucleic Acids Res..

[12]  Jiaxuan Chen,et al.  Conversion of Sox17 into a Pluripotency Reprogramming Factor by Reengineering Its Association with Oct4 on DNA , 2011, Stem cells.

[13]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[14]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[15]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[16]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[17]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[18]  P. Robson,et al.  Oct4 switches partnering from Sox2 to Sox17 to reinterpret the enhancer code and specify endoderm , 2013, The EMBO journal.

[19]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[20]  D. Miranda-Saavedra,et al.  Genome-wide analysis of STAT3 binding in vivo predicts effectors of the anti-inflammatory response in macrophages. , 2012, Blood.

[21]  D. Miranda-Saavedra,et al.  Discovery and characterization of new transcripts from RNA-seq data in mouse CD4(+) T cells. , 2012, Genomics.