VaProS: a database-integration approach for protein/genome information retrieval

Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein–protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts’ knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/.

[1]  R. A. Wevers,et al.  The frequency of lysosomal storage diseases in The Netherlands , 1999, Human Genetics.

[2]  Fumikazu Konishi,et al.  The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies , 2013, J. Biomed. Semant..

[3]  Kengo Kinoshita,et al.  COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems , 2014, Nucleic Acids Res..

[4]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[5]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[6]  Virginia Gewin,et al.  Data sharing: An open mind on open data , 2016, Nature.

[7]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[8]  Erik Schultes,et al.  Preserving sequence annotations across reference sequences , 2014, Journal of Biomedical Semantics.

[9]  J. Miller,et al.  Predicting the Functional Effect of Amino Acid Substitutions and Indels , 2012, PloS one.

[10]  Kara Dolinski,et al.  The BioGRID interaction database: 2015 update , 2014, Nucleic Acids Res..

[11]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[12]  Robert D. Finn,et al.  The European Bioinformatics Institute in 2016: Data growth and integration , 2015, Nucleic Acids Res..

[13]  Kengo Kinoshita,et al.  NLDB: a database for 3D protein–ligand interactions in enzymatic reactions , 2016, Journal of Structural and Functional Genomics.

[14]  Tsippi Iny Stein,et al.  In-silico human genomics with GeneCards , 2011, Human Genomics.

[15]  Y. Satow,et al.  Crystal Structure of Human β-Galactosidase , 2011, The Journal of Biological Chemistry.

[16]  Hongfang Liu,et al.  A common type system for clinical natural language processing , 2013, J. Biomed. Semant..

[17]  P. Meikle,et al.  Prevalence of lysosomal storage disorders. , 1999, JAMA.

[18]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[19]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[20]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[21]  Takeshi Kawabata,et al.  GTOP: a database of protein structures predicted from genome sequences , 2002, Nucleic Acids Res..

[22]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[23]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[24]  Akhilesh Pandey,et al.  Mutation@A Glance: An Integrative Web Application for Analysing Mutations from Human Genetic Diseases , 2010, DNA research : an international journal for rapid publication of reports on genes and genomes.

[25]  K. Bretonnel Cohen,et al.  BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains , 2014, J. Biomed. Semant..

[26]  Jing Chen,et al.  NDEx, the Network Data Exchange. , 2015, Cell systems.

[27]  Hideaki Sugawara,et al.  The Autophagy Database: an all-inclusive information resource on autophagy that provides nourishment for research , 2010, Nucleic Acids Res..

[28]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[29]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[30]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[31]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[32]  Nicola Brunetti-Pierri,et al.  GM1 gangliosidosis: review of clinical, molecular, and therapeutic aspects. , 2008, Molecular genetics and metabolism.

[33]  Takeshi Kawabata HOMCOS: an updated server to search and model complex 3D structures , 2016, Journal of Structural and Functional Genomics.

[34]  Toshihisa Takagi,et al.  DNA data bank of Japan (DDBJ) progress report , 2015, Nucleic Acids Res..