Current status and new features of the Consensus Coding Sequence database

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.

[1]  R. Wilson,et al.  Modernizing Reference Genome Assemblies , 2011, PLoS biology.

[2]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[3]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[4]  María Martín,et al.  The Universal Protein Resource (UniProt) in 2010 , 2010 .

[5]  K. Gunderson,et al.  Genome-wide DNA methylation profiling using Infinium® assay. , 2009, Epigenomics.

[6]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[7]  James G. R. Gilbert,et al.  The vertebrate genome annotation (Vega) database , 2004, Nucleic Acids Res..

[8]  Alison M. Meynert,et al.  Quantifying single nucleotide variant detection sensitivity in exome sequencing , 2013, BMC Bioinformatics.

[9]  Yan Huang,et al.  An analysis of exome sequencing for diagnostic testing of the genes associated with muscle disease and spastic paraplegia , 2012, Human mutation.

[10]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[11]  Jianwu Dai,et al.  L1 elements, processed pseudogenes and retrogenes in mammalian genomes , 2006, IUBMB life.

[12]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[13]  M. Spector,et al.  A comparative analysis of exome capture , 2011, Genome Biology.

[14]  Piero Carninci,et al.  5′ end–centered expression profiling using cap-analysis gene expression and next-generation sequencing , 2012, Nature Protocols.

[15]  Dereje D. Jima,et al.  The genetic landscape of mutations in Burkitt lymphoma , 2012, Nature Genetics.

[16]  The UniProt Consortium The Universal Protein Resource (UniProt) in 2010 , 2009, Nucleic Acids Res..

[17]  B. Shen,et al.  Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution , 2012, Proceedings of the National Academy of Sciences.

[18]  G. Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[19]  Allan Jacobson,et al.  NMD: a multifaceted response to premature translational termination , 2012, Nature Reviews Molecular Cell Biology.

[20]  A. Heck,et al.  Next-generation proteomics: towards an integrative view of proteome dynamics , 2012, Nature Reviews Genetics.

[21]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[22]  Laurent Gil,et al.  Ensembl 2013 , 2012, Nucleic Acids Res..

[23]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..

[24]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[25]  R. E. Tully,et al.  Locus Reference Genomic sequences: an improved basis for describing human DNA variants , 2010, Genome Medicine.

[26]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[27]  Jane Loveland,et al.  Tracking and coordinating an international curation effort for the CCDS Project , 2012, Database J. Biol. Databases Curation.

[28]  Nicholas T Ingolia,et al.  Genome-wide annotation and quantitation of translation by ribosome profiling. , 2013, Current protocols in molecular biology.