Touring Ensembl: A practical guide to genome browsing

The number of databases in molecular biological fields has rapidly increased to provide a large-scale resource. Though valuable information is available, data can be difficult to access, compare and integrate due to different formats and presentations of web interfaces. This paper offers a practical guide to the integration of gene, comparative genomic, and functional genomics data using the Ensembl website at http://www.ensembl.org.The Ensembl genome browser and underlying databases focus on chordate organisms. More species such as plants and microorganisms can be investigated using our sister browser at http://www.ensemblgenomes.org.In this study, four examples are used that sample many pages and features of the Ensembl browser. We focus on comparative studies across over 50 mostly chordate organisms, variations linked to disease, functional genomics, and access of external information housed in databases outside the Ensembl project. Researchers will learn how to go beyond simply exporting one gene sequence, and explore how a genome browser can integrate data from various sources and databases to build a full and comprehensive biological picture.

[1]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[2]  H. Prydz,et al.  CpG islands as gene markers in the human genome. , 1992, Genomics.

[3]  Victor V Lobanenkov,et al.  A CTCF-binding silencer regulates the imprinted genes AWT1 and WT1-AS and exhibits sequential epigenetic defects during Wilms' tumourigenesis. , 2007, Human molecular genetics.

[4]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[5]  Junjun Zhang,et al.  BioMart Central Portal—unified access to biological data , 2009, Nucleic Acids Res..

[6]  Michael Snyder,et al.  ChIP-chip: a genomic approach for identifying transcription factor binding sites. , 2002, Methods in enzymology.

[7]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[8]  Michael Y. Galperin,et al.  Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009 , 2008, Nucleic Acids Res..

[9]  A. Rao,et al.  Transcriptional regulation of the IL-2 gene. , 1995, Current opinion in immunology.

[10]  Obi L. Griffith,et al.  cisRED: a database system for genome-scale computational discovery of regulatory elements , 2005, Nucleic Acids Res..

[11]  E. Jankevics,et al.  Structure and analysis of the 5' flanking region of the human interleukin-2 gene. , 1994, Biochimica et biophysica acta.

[12]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[13]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[14]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[15]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[16]  Michael Ruogu Zhang,et al.  Computational identification of promoters and first exons in the human genome , 2002, Nature Genetics.

[17]  R Ohlsson,et al.  CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. , 2001, Trends in genetics : TIG.

[18]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[19]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[20]  Francisco Antequera,et al.  CpG islands as genomic footprints of promoters that are associated with replication origins , 1999, Current Biology.

[21]  Laurent Gil,et al.  Ensembl variation resources , 2010, BMC Genomics.

[22]  Michael Q. Zhang,et al.  Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome , 2007, Cell.

[23]  T. Hubbard,et al.  Computational detection and location of transcription start sites in mammalian genomic DNA. , 2002, Genome research.

[24]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[25]  T. Nomura,et al.  Foxp3 controls regulatory T-cell function by interacting with AML1/Runx1 , 2007, Nature.

[26]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[27]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[28]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[29]  E. Birney,et al.  EnsMart: a generic system for fast and flexible access to biological data. , 2003, Genome research.

[30]  Anton J. Enright,et al.  MicroRNA targets in Drosophila , 2003, Genome Biology.

[31]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[32]  J. Stroud,et al.  FOXP3 Controls Regulatory T Cell Function through Cooperation with NFAT , 2006, Cell.

[33]  Xosé M Fernández-Suárez,et al.  Using the Ensembl Genome Server to Browse Genomic Sequence Data , 2010, Current protocols in bioinformatics.

[34]  G. Crabtree,et al.  Identification of a putative regulator of early T cell activation genes. , 1988, Science.

[35]  M. Mooseker,et al.  A role for myosin VI in postsynaptic structure and glutamate receptor endocytosis , 2005, The Journal of cell biology.

[36]  T. Taniguchi,et al.  Structure of the human interleukin 2 gene. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[37]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[38]  Ewan Birney,et al.  Genome browsing with Ensembl: a practical overview. , 2007, Briefings in functional genomics & proteomics.

[39]  C. Chiang,et al.  The General Transcription Machinery and General Cofactors , 2006, Critical reviews in biochemistry and molecular biology.

[40]  Eric Peacock,et al.  Perlegen sciences, inc. , 2005, Pharmacogenomics.

[41]  K. Steel,et al.  Role of myosin VI in the differentiation of cochlear hair cells. , 1999, Developmental biology.

[42]  A. Bird CpG-rich islands and the function of DNA methylation , 1986, Nature.

[43]  Sean R. Eddy,et al.  The Distributed Annotation System , 2001, BMC Bioinformatics.

[44]  J. Rogers,et al.  Mutagenic Insertion and Chromosome Engineering Resource (MICER) , 2004, Nature Genetics.