CruzDB: software for annotation of genomic intervals with UCSC genome-browser database

MOTIVATION The biological significance of genomic features is often context dependent. Annotating a particular dataset with existing external data can provide insight into function. RESULTS We present CruzDB, a fast and intuitive programmatic interface to the University of California, Santa Cruz (UCSC) genome browser that facilitates integrative analyses of diverse local and remotely hosted datasets. We showcase the syntax of CruzDB using microRNA binding sites as examples, and further demonstrate its utility with three biological discoveries. First, DNA replication timing is stratified in gene regions-exons tend to replicate early and introns late during S phase. Second, several non-coding variants associated with cognitive functions map to lincRNA transcripts of relevant function, suggesting potential function of these regulatory RNAs in neuronal diseases. Third, lamina-associated genomic regions are highly enriched in olfaction-related genes, indicating a role of nuclear organization in their regulation.

[1]  Christian Burks,et al.  Molecular Biology Database List , 1999, Nucleic Acids Res..

[2]  Michael O Dorschner,et al.  Sequencing newly replicated DNA reveals widespread plasticity in human replication timing , 2009, Proceedings of the National Academy of Sciences.

[3]  Tom Misteli,et al.  The lamin protein family , 2011, Genome Biology.

[4]  L. Wessels,et al.  Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions , 2008, Nature.

[5]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[6]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[7]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[8]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2013 , 2012, Nucleic Acids Res..

[9]  E. Mardis The $1,000 genome, the $100,000 analysis? , 2010, Genome Medicine.

[10]  L. Lim,et al.  MicroRNA targeting specificity in mammals: determinants beyond seed pairing. , 2007, Molecular cell.

[11]  Jeannie T. Lee Epigenetic Regulation by Long Noncoding RNAs , 2012, Science.

[12]  Bruce Nicholson,et al.  Gap-junction channels dysfunction in deafness and hearing loss. , 2009, Antioxidants & redox signaling.

[13]  真田 昌 骨髄異形成症候群のgenome-wide analysis , 2013 .

[14]  Cole Trapnell,et al.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. , 2011, Genes & development.

[15]  Manolis Kellis,et al.  Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types , 2013, Genome research.

[16]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2011 , 2011, Nucleic Acids Res..

[17]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[18]  Bing Liu,et al.  Genome‐Wide Analysis of Human SNPs at Long Intergenic Noncoding RNAs , 2013, Human mutation.

[19]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[20]  Tamas Dalmay,et al.  Mutations in the seed region of human miR-96 are responsible for nonsyndromic progressive hearing loss , 2009, Nature Genetics.

[21]  Carolyn A. Larabell,et al.  Nuclear Aggregation of Olfactory Receptor Genes Governs Their Monogenic Expression , 2012, Cell.

[22]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.