H-InvDB in 2009: extended database and data mining resources for human genes and transcripts

We report the extended database and data mining resources newly released in the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). H-InvDB is a comprehensive annotation resource of human genes and transcripts, and consists of two main views and six sub-databases. The latest release of H-InvDB (release 6.2) provides the annotation for 219 765 human transcripts in 43 159 human gene clusters based on human full-length cDNAs and mRNAs. H-InvDB now provides several new annotation features, such as mapping of microarray probes, new gene models, relation to known ncRNAs and information from the Glycogene database. H-InvDB also provides useful data mining resources—‘Navigation search’, ‘H-InvDB Enrichment Analysis Tool (HEAT)’ and web service APIs. ‘Navigation search’ is an extended search system that enables complicated searches by combining 16 different search options. HEAT is a data mining tool for automatically identifying features specific to a given human gene set. HEAT searches for H-InvDB annotations that are significantly enriched in a user-defined gene set, as compared with the entire H-InvDB representative transcripts. H-InvDB now has web service APIs of SOAP and REST to allow the use of H-InvDB data in programs, providing the users extended data accessibility.

[1]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[2]  E. Giglia Medline/PubMed revisited: new, semantic tools to explore the biomedical literature. , 2009, European journal of physical and rehabilitation medicine.

[3]  Hajime Nakaoka,et al.  Hyperlink Management System and ID Converter System: enabling maintenance-free hyperlinks among major biological databases , 2009, Nucleic Acids Res..

[4]  Rodrigo Lopez,et al.  Web services at the European Bioinformatics Institute-2009 , 2009, Nucleic Acids Res..

[5]  Andrew M. Jenkinson,et al.  Ensembl 2009 , 2008, Nucleic Acids Res..

[6]  Hideaki Sugawara,et al.  The GTOP database in 2009: updated content and novel features to expand and deepen insights into protein structures and functions , 2008, Nucleic Acids Res..

[7]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[8]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[9]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[10]  The UniProt Consortium,et al.  The Universal Protein Resource (UniProt) 2009 , 2008, Nucleic Acids Res..

[11]  Yoshiharu Sato,et al.  Low conservation and species-specific evolution of alternative splicing in humans and mice: comparative genomics analysis using well-annotated full-length cDNAs , 2008, Nucleic acids research.

[12]  Yoshio Tateno,et al.  [International collaboration among DDBJ, EMBL Bank and GenBank]. , 2008, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme.

[13]  Teruyoshi Hishiki,et al.  The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts , 2007, Nucleic Acids Res..

[14]  Sue Povey,et al.  The HGNC Database in 2008: a resource for the human genome , 2007, Nucleic Acids Res..

[15]  Katsuhiko Murakami,et al.  Evola: Ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees , 2007, Nucleic Acids Res..

[16]  S. Minoshima,et al.  MutationView/KMcancerDB: A database for cancer gene mutations , 2007, Cancer science.

[17]  Aya Kojima,et al.  fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences , 2006, Nucleic Acids Res..

[18]  Mark Gerstein,et al.  Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation , 2006, Nucleic Acids Res..

[19]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[20]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[21]  Takuro Tamura,et al.  Investigation of protein functions through data-mining on integrated human transcriptome database, H-Invitational database (H-InvDB). , 2005, Gene.

[22]  Steven Salzberg,et al.  JIGSAW: integration of multiple sources of evidence for gene prediction , 2005, Bioinform..

[23]  Rasmus Wernersson FeatureExtract—extraction of sequence annotation made easy , 2005, Nucleic Acids Res..

[24]  Teruyoshi Hishiki,et al.  The Human Anatomic Gene Expression Library (H-ANGEL), the H-Inv integrative display of human gene expression across disparate technologies and platforms , 2004, Nucleic Acids Res..

[25]  Kanako O. Koyanagi,et al.  Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones , 2004, PLoS Biology.

[26]  Hisashi Narimatsu,et al.  Construction of a human glycogene library and comprehensive functional analysis , 2004, Glycoconjugate Journal.

[27]  Tsviya Olender,et al.  Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE , 2003, Nucleic Acids Res..

[28]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[29]  M. Kozak Compilation and analysis of sequences upstream from the translational start site in eukaryotic mRNAs. , 1984, Nucleic acids research.