NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

[1]  Kimberly Van Auken,et al.  WormBase: better software, richer content , 2005, Nucleic Acids Res..

[2]  A. Prats,et al.  Generation of protein isoform diversity by alternative initiation of translation at non‐AUG codons , 2003, Biology of the cell.

[3]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[4]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[5]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[6]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[7]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[8]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[9]  Paul R Copeland,et al.  Regulation of gene expression by stop codon recoding: selenocysteine. , 2003, Gene.

[10]  Madeline A. Crosby,et al.  FlyBase: genes and gene models , 2004, Nucleic Acids Res..

[11]  Jungwon Yoon,et al.  The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community , 2003, Nucleic Acids Res..

[12]  G. Schuler,et al.  Entrez: molecular biology database and retrieval system. , 1996, Methods in enzymology.

[13]  The FlyBase database of the Drosophila genome projects and community literature. , 2003, Nucleic acids research.

[14]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms , 2004, Nucleic Acids Res..

[15]  John B. Anderson,et al.  CDD: a curated Entrez database of conserved domain alignments , 2003, Nucleic Acids Res..

[16]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): updates and enhancements , 2005, Nucleic Acids Res..

[17]  Tatiana A. Tatusova,et al.  Complete genomes in WWW Entrez: data representation and analysis , 1999, Bioinform..

[18]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): integrating biology with the genome , 2004, Nucleic Acids Res..