RefSeq: an update on mammalian reference sequences

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI’s eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI’s eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.

[1]  The UniProt Consortium,et al.  Update on activities at the Universal Protein Resource (UniProt) in 2013 , 2012, Nucleic Acids Res..

[2]  Alexander Souvorov,et al.  Splign: algorithms for computing spliced alignments with identification of paralogs , 2008, Biology Direct.

[3]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[4]  Judith A. Blake,et al.  The Mouse Genome Database: Genotypes, Phenotypes, and Models of Human Disease , 2012, Nucleic Acids Res..

[5]  Mark Gerstein,et al.  Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation , 2006, Nucleic Acids Res..

[6]  Nora Husain,et al.  The NIH genetic testing registry: a new, centralized database of genetic tests to enable access to comprehensive information and improve transparency , 2012, Nucleic Acids Res..

[7]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[8]  James Robinson,et al.  The IMGT/HLA database , 2008, Nucleic Acids Res..

[9]  Anne E Willis,et al.  A perspective on mammalian upstream open reading frame function , 2013, The International Journal of Biochemistry & Cell Biology.

[10]  R. E. Tully,et al.  Locus Reference Genomic sequences: an improved basis for describing human DNA variants , 2010, Genome Medicine.

[11]  Elspeth A. Bruford,et al.  Genenames.org: the HGNC resources in 2013 , 2012, Nucleic Acids Res..

[12]  V. Mootha,et al.  Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans , 2009, Proceedings of the National Academy of Sciences.

[13]  R. Wilson,et al.  Modernizing Reference Genome Assemblies , 2011, PLoS biology.

[14]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[15]  Melinda R. Dwinell,et al.  The Rat Genome Database 2009: variation, ontologies and pathways , 2008, Nucleic Acids Res..

[16]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[17]  K. Katz,et al.  Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. , 2000, Trends in genetics : TIG.

[18]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[19]  Nicholas T Ingolia,et al.  Genome-wide annotation and quantitation of translation by ribosome profiling. , 2013, Current protocols in molecular biology.

[20]  Jacqueline A. L. MacArthur,et al.  Locus Reference Genomic: reference sequences for the reporting of clinically relevant sequence variants , 2013, Nucleic Acids Res..

[21]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[22]  G. Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[23]  S. Carr,et al.  A Mitochondrial Protein Compendium Elucidates Complex I Disease Biology , 2008, Cell.

[24]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[25]  Allan Jacobson,et al.  NMD: a multifaceted response to premature translational termination , 2012, Nature Reviews Molecular Cell Biology.

[26]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[27]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.