Bioinformatics resources from the National Center for Biotechnology Information: An integrated foundation for discovery

The National Center for Biotechnology Information (NCBI) provides access to more than 30 publicly available molecular biology resources, offering an effective discovery space through high levels of data integration among large-scale data repositories. The foundation for many services is GenBank®, a public repository of DNA sequences from more than 133,000 different organisms. GenBank is accessible through the Entrez retrieval system, which integrates data from the major DNA and protein sequence databases, along with resources for taxonomy, genome maps, sequence variation, gene expression, gene function and phenotypes, protein structure and domain information, and the biomedical literature via PubMed®. Computational tools allow scientists to analyze vast quantities of diverse data. The BLAST® sequence similarity programs are instrumental in identifying genes and genetic features. Other tools support mapping disease loci to the genome, identifying new genes, comparing genomes, and relating sequence data to model protein structures. A basic research program in computational molecular biology enhances the database and software tool development initiatives. Future plans include further data integration, enhanced genome annotation and protein classification, additional data types, and links to a wider range of resources.

[1]  Peer Bork,et al.  Recent improvements to the SMART domain-based sequence annotation resource , 2002, Nucleic Acids Res..

[2]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[3]  G. Schuler,et al.  Entrez: molecular biology database and retrieval system. , 1996, Methods in enzymology.

[4]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[5]  M. Brush CAST OF NEW PLAYERS: A PROFILE OF NEW PRECAST GELS FOR NUCLEIC ACID , 1999 .

[6]  Vincent Lombard,et al.  The EMBL Nucleotide Sequence Database: major new developments , 2003, Nucleic Acids Res..

[7]  Hideaki Sugawara,et al.  DNA Data Bank of Japan (DDBJ) for genome scale research in life science , 2002, Nucleic Acids Res..

[8]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  Robert S. Ledley,et al.  The Protein Information Resource , 2003, Nucleic Acids Res..

[11]  Yanli Wang,et al.  MMDB: Entrez's 3D-structure database , 2003, Nucleic Acids Res..

[12]  Zukang Feng,et al.  The Protein Data Bank and structural genomics , 2003, Nucleic Acids Res..

[13]  John B. Anderson,et al.  MMDB: Entrez's 3D-structure database , 2002, Nucleic Acids Res..

[14]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[15]  J. Naylor,et al.  Mendelian inheritance in man: A catalog of human genes and genetic disorders , 1996 .

[16]  P. Lijnzaad,et al.  A physical map of 30,000 human genes. , 1998, Science.