The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes

Abstract This year's Database Issue of Nucleic Acids Research contains 152 papers that include descriptions of 54 new databases and update papers on 98 databases, of which 16 have not been previously featured in NAR. As always, these databases cover a broad range of molecular biology subjects, including genome structure, gene expression and its regulation, proteins, protein domains, and protein–protein interactions. Following the recent trend, an increasing number of new and established databases deal with the issues of human health, from cancer-causing mutations to drugs and drug targets. In accordance with this trend, three recently compiled databases that have been selected by NAR reviewers and editors as ‘breakthrough’ contributions, denovo-db, the Monarch Initiative, and Open Targets, cover human de novo gene variants, disease-related phenotypes in model organisms, and a bioinformatics platform for therapeutic target identification and validation, respectively. We expect these databases to attract the attention of numerous researchers working in various areas of genetics and genomics. Looking back at the past 12 years, we present here the ‘golden set’ of databases that have consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database. The Database Issue is freely available online at the https://academic.oup.com/nar web site. An updated version of the NAR Molecular Biology Database Collection is available at http://www.oxfordjournals.org/nar/database/a/.

[1]  David J. Arenillas,et al.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles , 2015, Nucleic Acids Res..

[2]  Lincoln Stein,et al.  Gramene 2016: comparative plant genomics and pathway resources , 2015, Nucleic Acids Res..

[3]  David S. Goodsell,et al.  The RCSB Protein Data Bank: views of structural biology for basic and applied research and education , 2014, Nucleic Acids Res..

[4]  Elspeth A. Bruford,et al.  Genenames.org: the HGNC resources in 2015 , 2014, Nucleic Acids Res..

[5]  Rex L. Chisholm,et al.  dictyBase 2013: integrating multiple Dictyostelid species , 2012, Nucleic Acids Res..

[6]  Giovanna Ambrosini,et al.  The Eukaryotic Promoter Database: expansion of EPDnew and new promoter analysis tools , 2014, Nucleic Acids Res..

[7]  James R. Cole,et al.  Ribosomal Database Project: data and tools for high throughput rRNA analysis , 2013, Nucleic Acids Res..

[8]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[9]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[10]  Peter D. Karp,et al.  EcoCyc: fusing model organism databases with systems biology , 2012, Nucleic Acids Res..

[11]  Janan T. Eppig,et al.  Mouse Tumor Biology (MTB): a database of mouse models for human cancer , 2014, Nucleic Acids Res..

[12]  Damian Szklarczyk,et al.  STITCH 5: augmenting protein–chemical interaction networks with tissue and affinity data , 2015, Nucleic Acids Res..

[13]  Wei Wu,et al.  NONCODE 2016: an informative and valuable data source of long non-coding RNAs , 2015, Nucleic Acids Res..

[14]  Michael Y. Galperin,et al.  Expanded microbial genome coverage and improved protein family annotation in the COG database , 2014, Nucleic Acids Res..

[15]  Anushya Muruganujan,et al.  PANTHER version 10: expanded protein families and functions, and analysis tools , 2015, Nucleic Acids Res..

[16]  David S. Wishart,et al.  HMDB 3.0—The Human Metabolome Database in 2013 , 2012, Nucleic Acids Res..

[17]  Toby J. Gibson,et al.  ELM 2016—data update and new functionality of the eukaryotic linear motif resource , 2015, Nucleic Acids Res..

[18]  Weisong Liu,et al.  The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease , 2014, Nucleic Acids Res..

[19]  Nuno A. Fonseca,et al.  Expression Atlas update—an integrated database of gene and protein expression in humans, animals and plants , 2015, Nucleic Acids Res..

[20]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[21]  Bas Vroling,et al.  GPCRdb: an information system for G protein-coupled receptors , 2015, Nucleic Acids Res..

[22]  Eileen Kraemer,et al.  EuPathDB: The Eukaryotic Pathogen database , 2012, Nucleic Acids Res..

[23]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[24]  Judith A. Blake,et al.  Mouse genome database 2016 , 2015, Nucleic Acids Res..

[25]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[26]  Antje Chang,et al.  BRENDA in 2015: exciting developments in its 25th year of existence , 2014, Nucleic Acids Res..

[27]  Milton H. Saier,et al.  The Transporter Classification Database (TCDB): recent advances , 2015, Nucleic Acids Res..

[28]  Prudence Mutowo-Meullenet,et al.  The GOA database: Gene Ontology annotation updates for 2015 , 2014, Nucleic Acids Res..

[29]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[30]  Wei Li,et al.  mirDNMR: a gene-centered database of background de novo mutation rates in human , 2016, Nucleic Acids Res..

[31]  Ying Zhang,et al.  The neXtProt knowledgebase on human proteins: current status , 2014, Nucleic Acids Res..

[32]  Marco Biasini,et al.  SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information , 2014, Nucleic Acids Res..

[33]  Athanasios Fevgas,et al.  DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions , 2014, Nucleic Acids Res..

[34]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[35]  Thomas Horn,et al.  GenomeRNAi: a database for cell-based and in vivo RNAi phenotypes, 2013 update , 2012, Nucleic Acids Res..

[36]  Brandi L. Cantarel,et al.  The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics , 2008, Nucleic Acids Res..

[37]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[38]  Patrice Duroux,et al.  IMGT®, the international ImMunoGeneTics information system® 25 years on , 2014, Nucleic Acids Res..

[39]  Janan T. Eppig,et al.  The mouse Gene Expression Database (GXD): 2014 update , 2013, Nucleic Acids Res..

[40]  Athanasios Fevgas,et al.  DIANA-miRGen v3.0: accurate characterization of microRNA promoters and their regulators , 2015, Nucleic Acids Res..

[41]  Rashmi Pant,et al.  The Pathogen-Host Interactions database (PHI-base): additions and future developments , 2014, Nucleic Acids Res..

[42]  Pelin Yilmaz,et al.  The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks , 2013, Nucleic Acids Res..

[43]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[44]  Richard J. Roberts,et al.  REBASE—a database for DNA restriction and modification: enzymes, genes and genomes , 2009, Nucleic Acids Res..

[45]  Tudor Groza,et al.  The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species , 2016, bioRxiv.

[46]  Giulia Antonazzo,et al.  FlyBase: establishing a Gene Group resource for Drosophila melanogaster , 2015, Nucleic Acids Res..

[47]  Joshua Fortriede,et al.  Xenbase, the Xenopus model organism database; new virtualized system, data types and genomes , 2014, Nucleic Acids Res..

[48]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[49]  David A. Lee,et al.  CATH: comprehensive structural and functional annotations for genome sequences , 2014, Nucleic Acids Res..

[50]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[51]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[52]  Daniel R. Zerbino,et al.  Ensembl 2016 , 2015, Nucleic Acids Res..

[53]  Yan Lin,et al.  DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements , 2013, Nucleic Acids Res..

[54]  Sandra Gesing,et al.  VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases , 2014, Nucleic Acids Res..

[55]  Julie Moss,et al.  EMAGE mouse embryo spatial gene expression database: 2014 update , 2013, Nucleic Acids Res..

[56]  Hsien-Da Huang,et al.  miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database , 2015, Nucleic Acids Res..

[57]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[58]  Dachuan Zhang,et al.  MMDB and VAST+: tracking structural similarities between macromolecular complexes , 2013, Nucleic Acids Res..

[59]  Narmada Thanki,et al.  CDD: NCBI's conserved domain database , 2014, Nucleic Acids Res..

[60]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[61]  Alexey G. Murzin,et al.  SCOP2 prototype: a new approach to protein structure mining , 2014, Nucleic Acids Res..

[62]  Michael A. Hicks,et al.  The Structure–Function Linkage Database , 2013, Nucleic Acids Res..

[63]  David Haussler,et al.  ENCODE Data in the UCSC Genome Browser: year 5 update , 2012, Nucleic Acids Res..

[64]  Kara Dolinski,et al.  The BioGRID interaction database: 2015 update , 2014, Nucleic Acids Res..

[65]  Raphael A. Bernier,et al.  denovo-db: a compendium of human de novo variants , 2016, Nucleic Acids Res..

[66]  David A. Lee,et al.  Gene3D: expanding the utility of domain assignments , 2015, Nucleic Acids Res..

[67]  Abhik Mukhopadhyay,et al.  PDBe: improved accessibility of macromolecular structure data from PDB and EMDB , 2015, Nucleic Acids Res..

[68]  Sanghyuk Lee,et al.  miRGator v3.0: a microRNA portal for deep sequencing, expression profiling and mRNA targeting , 2012, Nucleic Acids Res..

[69]  Karel Berka,et al.  PDBsum additions , 2013, Nucleic Acids Res..

[70]  Christoph Steinbeck,et al.  ChEBI in 2016: Improved services and an expanding collection of metabolites , 2015, Nucleic Acids Res..

[71]  Catarina Costa,et al.  The YEASTRACT database: an upgraded information system for the analysis of gene and genomic transcription regulation in Saccharomyces cerevisiae , 2013, Nucleic Acids Res..

[72]  Terri K. Attwood,et al.  PRINTS and its automatic supplement, prePRINTS , 2003, Nucleic Acids Res..

[73]  Davide Heller,et al.  eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences , 2015, Nucleic Acids Res..

[74]  Elisabeth Coudert,et al.  HAMAP in 2015: updates to the protein family classification and annotation system , 2014, Nucleic Acids Res..

[75]  James Robinson,et al.  The IPD and IMGT/HLA database: allele variant databases , 2014, Nucleic Acids Res..

[76]  Gautier Koscielny,et al.  Open Targets: a platform for therapeutic target identification and validation , 2016, Nucleic Acids Res..

[77]  Edith D. Wong,et al.  The Saccharomyces Genome Database Variant Viewer , 2015, Nucleic Acids Res..

[78]  Hai Fang,et al.  The SUPERFAMILY 1.75 database in 2014: a doubling of data , 2014, Nucleic Acids Res..

[79]  Huaiyu Mi,et al.  The InterPro protein families database: the classification resource after 15 years , 2014, Nucleic Acids Res..

[80]  Carsten O. Daub,et al.  Update of the FANTOM web resource: from mammalian transcriptional landscape to its dynamic regulation , 2010, Nucleic Acids Res..

[81]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[82]  Marek S. Skrzypek,et al.  The Candida Genome Database: The new homology information page highlights protein similarity and phylogeny , 2013, Nucleic Acids Res..

[83]  Erik L. L. Sonnhammer,et al.  InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic , 2014, Nucleic Acids Res..

[84]  Evgeny M. Zdobnov,et al.  OrthoDB v8: update of the hierarchical catalog of orthologs and the underlying free software , 2014, Nucleic Acids Res..

[85]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[86]  I-Min A. Chen,et al.  IMG 4 version of the integrated microbial genomes comparative analysis system , 2013, Nucleic Acids Res..

[87]  Takakazu Kaneko,et al.  CyanoBase and RhizoBase: databases of manually curated annotations for cyanobacterial and rhizobial genomes , 2013, Nucleic Acids Res..

[88]  Thomas Rattei,et al.  SIMAP—the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage , 2013, Nucleic Acids Res..

[89]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools , 2011, Nucleic Acids Res..

[90]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2013 , 2012, Nucleic Acids Res..

[91]  Robert S. Ledley,et al.  PIRSF: family classification system at the Protein Information Resource , 2004, Nucleic Acids Res..

[92]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[93]  Yanhui Hu,et al.  FlyRNAi.org—the database of the Drosophila RNAi screening center: 2012 update , 2011, Nucleic Acids Res..

[94]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[95]  Guy Cochrane,et al.  Biocuration of functional annotation at the European nucleotide archive , 2015, Nucleic Acids Res..

[96]  Kimberly Van Auken,et al.  WormBase 2016: expanding to enable helminth genomic research , 2015, Nucleic Acids Res..

[97]  Kenta Nakai,et al.  DBTSS as an integrative platform for transcriptome, epigenome and genome sequence variation data , 2014, Nucleic Acids Res..

[98]  Nikos Kyrpides,et al.  The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification , 2014, Nucleic Acids Res..

[99]  Brian T. Lee,et al.  The UCSC Genome Browser database: 2015 update , 2014, Nucleic Acids Res..

[100]  Yan Zhang,et al.  PATRIC, the bacterial bioinformatics database and analysis resource , 2013, Nucleic Acids Res..

[101]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[102]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[103]  Deborah Hix,et al.  The immune epitope database (IEDB) 3.0 , 2014, Nucleic Acids Res..

[104]  Sanghyuk Lee,et al.  ChimerDB 2.0—a knowledgebase for fusion genes updated , 2009, Nucleic Acids Res..

[105]  Monte Westerfield,et al.  ZFIN, the Zebrafish Model Organism Database: increased support for mutants and transgenics , 2012, Nucleic Acids Res..

[106]  Toshihisa Takagi,et al.  DNA data bank of Japan (DDBJ) progress report , 2015, Nucleic Acids Res..

[107]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[108]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[109]  Vassiliki Gkantouna,et al.  Developments in FINDbase worldwide database for clinically relevant genomic variation allele frequencies , 2013, Nucleic Acids Res..

[110]  Hsien-Da Huang,et al.  dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins , 2015, Nucleic Acids Res..

[111]  Peer Bork,et al.  SMART: recent updates, new developments and status in 2015 , 2014, Nucleic Acids Res..

[112]  Neil D. Rawlings,et al.  Twenty years of the MEROPS database of proteolytic enzymes, their substrates and inhibitors , 2015, Nucleic Acids Res..

[113]  Michael Y. Galperin The Molecular Biology Database Collection: 2006 update , 2005, Nucleic Acids Res..