Assembly: a resource for assembled genomes at NCBI

The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site.

[1]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[2]  Nikos Kyrpides,et al.  The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification , 2014, Nucleic Acids Res..

[3]  M. Vainstein,et al.  Fungal zinc metabolism and its connections to virulence , 2013, Front. Cell. Infect. Microbiol..

[4]  Meng-Han Yang,et al.  Identification of cucurbitacins and assembly of a draft genome for Aquilaria agallocha , 2013, BMC Genomics.

[5]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[6]  Surong Hasi,et al.  Genome sequences of wild and domestic bactrian camels , 2012, Nature Communications.

[7]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[8]  Timothy B Sackton,et al.  Genome of the house fly, Musca domestica L., a global vector of diseases with adaptations to a septic environment , 2014, Genome Biology.

[9]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[10]  C. Thermes,et al.  Ten years of next-generation sequencing technology. , 2014, Trends in genetics : TIG.

[11]  Bronwen L. Aken,et al.  The sheep genome illuminates biology of the rumen and lipid metabolism , 2014, Science.

[12]  Guojun Yang,et al.  Draft genome sequence of the mulberry tree Morus notabilis , 2013, Nature Communications.

[13]  Guy Cochrane,et al.  Toward richer metadata for microbial sequences: replacing strain-level NCBI taxonomy taxids with BioProject, BioSample and Assembly records , 2014, Standards in genomic sciences.

[14]  Galina A. Erikson,et al.  The First Myriapod Genome Sequence Reveals Conservative Arthropod Gene Content and Genome Organisation in the Centipede Strigamia maritima , 2014, PLoS biology.

[15]  Tatiana A. Tatusova,et al.  BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata , 2011, Nucleic Acids Res..

[16]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[17]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[18]  R. Wilson,et al.  Modernizing Reference Genome Assemblies , 2011, PLoS biology.

[19]  Kim-Kee Tan,et al.  Full genome SNP-based phylogenetic analysis reveals the origin and global spread of Brucella melitensis , 2015, BMC Genomics.

[20]  Björn Hammesfahr,et al.  diArk – the database for eukaryotic genome and transcriptome assemblies in 2014 , 2014, Nucleic Acids Res..

[21]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[22]  Sergey A. Shiryev,et al.  Single haplotype assembly of the human genome from a hydatidiform mole , 2014, bioRxiv.

[23]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[24]  Alessandro Vullo,et al.  Ensembl 2015 , 2014, Nucleic Acids Res..