The Genominator User Guide

The Genominator package provides an interface to storing and retrieving genomic data, together with some additional functionality aimed at high-throughput sequence data. The intent is that retrieval and summarization will be fast enough to enable online experimentation with the data. We have used to package to analyze tiling arrays and (perhaps more appropriate) RNA-Seq data consisting of more than 400 million reads. The canonical use case at the core of the package is summarizing the data over a large number of genomic regions. The standard example is for each annotated exon in human, count the number of reads that lands in that exon, for all experimental samples. Data is stored in a SQLite database, and as such the package makes it possible to work with very large datasets in limited memory. However, working with SQLite databases is limited by I/O (disk speed), and substantial performance gains are possible by using a fast disk. Work using this package should cite [1].