A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources.

In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker.

[1]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[2]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[3]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[4]  Michel Dumontier,et al.  RKB: a Semantic Web knowledge base for RNA , 2010, J. Biomed. Semant..

[5]  José L. V. Mejino,et al.  A reference ontology for biomedical informatics: the Foundational Model of Anatomy , 2003, J. Biomed. Informatics.

[6]  Oktie Hassanzadeh,et al.  Data Management Issues on the Semantic Web , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[7]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[8]  Amit P. Sheth,et al.  An ontology-driven semantic mashup of gene and biological pathway information: Application to the domain of nicotine dependence , 2008, J. Biomed. Informatics.

[9]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[10]  Alan Ruttenberg,et al.  Life sciences on the Semantic Web: the Neurocommons and beyond , 2009, Briefings Bioinform..

[11]  Mark D. Wilkinson,et al.  SADI, SHARE, and the in silico scientific method , 2010, BMC Bioinformatics.

[12]  Martin Kuiper,et al.  Biological knowledge management: the emerging role of the Semantic Web technologies , 2009, Briefings Bioinform..

[13]  Andrea Splendiani,et al.  Knowledge sharing and collaboration in translational research, and the DC-THERA Directory , 2011, Briefings Bioinform..

[14]  S. Kota,et al.  Genetics of type 2 diabetes mellitus and other specific types of diabetes; its role in treatment modalities. , 2012, Diabetes & metabolic syndrome.

[15]  J. Meigs,et al.  Prediction of type 2 diabetes: the dawn of polygenetic testing for complex disease , 2009, Diabetologia.

[16]  Kei-Hoi Cheung,et al.  Leveraging the structure of the Semantic Web to enhance information retrieval for proteomics , 2007, Bioinform..

[17]  A. Attie,et al.  The genetic landscape of type 2 diabetes in mice. , 2007, Endocrine reviews.

[18]  Dietrich Rebholz-Schuhmann,et al.  Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources , 2013, J. Biomed. Semant..

[19]  John Bell,et al.  Redefining disease. , 2010, Clinical medicine.

[20]  Kei-Hoi Cheung,et al.  Semantic Web for data harmonization in Chinese medicine , 2010, Chinese medicine.

[21]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[22]  Rob W.W. Hooft,et al.  The value of data , 2011, Nature Genetics.

[23]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[24]  A Burgun,et al.  Accessing and Integrating Data and Knowledge for Biomedical Research , 2008, Yearbook of Medical Informatics.

[25]  Sougata Mukherjea,et al.  Information retrieval and knowledge discovery utilising a biomedical Semantic Web , 2005, Briefings Bioinform..

[26]  Michel Dumontier,et al.  Controlled vocabularies and semantics in systems biology , 2011, Molecular systems biology.

[27]  Kirill Degtyarenko,et al.  ChEBI: An Open Bioinformatics and Cheminformatics Resource , 2009, Current protocols in bioinformatics.

[28]  Kei-Hoi Cheung,et al.  Structured digital tables on the Semantic Web: toward a structured digital literature , 2010, Molecular systems biology.

[29]  Paul N. Schofield,et al.  PhenomeNET: a whole-phenome approach to disease gene discovery , 2011, Nucleic acids research.

[30]  Mark I McCarthy,et al.  Learning From Molecular Genetics , 2008, Diabetes.

[31]  Paul Groth The Anatomy of a Nano-publication , 2010 .

[32]  Michael Schroeder,et al.  A Semantic Web for bioinformatics: goals, tools, systems, applications , 2008, BMC Bioinformatics.

[33]  Robert J. Smith,et al.  Personalized medicine in diabetes. , 2011, Clinical chemistry.

[34]  Christian Herder,et al.  Genetics of type 2 diabetes: pathophysiologic and clinical relevance , 2011, European journal of clinical investigation.

[35]  A. Janež,et al.  Molecular mechanisms of insulin resistance and associated diseases. , 2007, Clinica chimica acta; international journal of clinical chemistry.

[36]  E. Birney,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Research.

[37]  Holger Stenzhorn,et al.  Establishing a distributed system for the simple representation and integration of diverse scientific assertions , 2010, J. Biomed. Semant..

[38]  R. Luben,et al.  Genetic predisposition to obesity leads to increased risk of type 2 diabetes , 2011, Diabetologia.

[39]  Antony J. Williams,et al.  Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining , 2010, J. Cheminformatics.

[40]  María Martín,et al.  Ongoing and future developments at the Universal Protein Resource , 2010, Nucleic Acids Res..

[41]  Barend Mons,et al.  Open PHACTS: semantic interoperability for drug discovery. , 2012, Drug discovery today.

[42]  Nektarios Gioldasis,et al.  SPARQL-RW: transparent query access over mapped RDF data sources , 2012, EDBT '12.

[43]  Martijn J. Schuemie,et al.  Structuring and extracting knowledge for the support of hypothesis generation in molecular biology , 2009, BMC Bioinformatics.

[44]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[45]  Dietrich Rebholz-Schuhmann,et al.  The semantic web in translational medicine: current applications and future directions , 2013, Briefings Bioinform..

[46]  Marek Reformat,et al.  Assimilation of Information in RDF-Based Knowledge Base , 2012, IPMU.

[47]  D. Rebholz-Schuhmann,et al.  Text-mining solutions for biomedical research: enabling integrative biology , 2012, Nature Reviews Genetics.

[48]  Gang Feng,et al.  From disease ontology to disease-ontology lite: statistical methods to adapt a general-purpose ontology for the test of gene-ontology associations , 2009, Bioinform..

[49]  Huajun Chen,et al.  Semantic web for integrated network analysis in biomedicine , 2009, Briefings Bioinform..

[50]  T. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2010, Nucleic Acids Res..

[51]  Dietrich Rebholz-Schuhmann,et al.  Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI) , 2013, PloS one.

[52]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[53]  Mark D. Wilkinson,et al.  The Semantic Automated Discovery and Integration (SADI) Web service Design-Pattern, API and Reference Implementation , 2011 .

[54]  Mark I McCarthy,et al.  Progress in defining the molecular basis of type 2 diabetes mellitus through susceptibility-gene identification. , 2004, Human molecular genetics.

[55]  Eric K. Neumann,et al.  Knowledge networks in the age of the Semantic Web , 2007, Briefings Bioinform..

[56]  Goran Nenadic,et al.  IeXML: towards an annotation framework for biomedical semantic types enabling interoperability of text processing modules , 2006 .

[57]  Huajun Chen,et al.  Semantic Web meets Integrative Biology: a survey , 2013, Briefings Bioinform..

[58]  Charles J. Colbourn,et al.  The BioIntelligence Framework: a new computational platform for biomedical knowledge computing , 2013, J. Am. Medical Informatics Assoc..

[59]  Mark I McCarthy,et al.  Exploring the unknown: assumptions about allelic architecture and strategies for susceptibility variant discovery , 2009, Genome Medicine.

[60]  Michel Dumontier,et al.  A common layer of interoperability for biomedical ontologies based on OWL EL , 2011, Bioinform..

[61]  Markus Perola,et al.  Lessons from studying monogenic disease for common disease. , 2006, Human molecular genetics.

[62]  Mohanbir Sawhney,et al.  Innovation and Virtual Environments: Towards Virtual Knowledge Brokers , 2006 .

[63]  Dietrich Rebholz-Schuhmann,et al.  UKPMC: a full text article resource for the life sciences , 2011, Nucleic Acids Res..

[64]  Antony J. Williams,et al.  Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining , 2010, J. Cheminformatics.

[65]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[66]  M. Daly,et al.  Genetic Mapping in Human Disease , 2008, Science.

[67]  Yimin Wang,et al.  Semantic Web for Health Care and Life Sciences: a review of the state of the art , 2009, Briefings Bioinform..

[68]  Dietrich Rebholz-Schuhmann,et al.  Distributed modules for text annotation and IE applied to the biomedical domain , 2004 .

[69]  David M. Shotton,et al.  Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article , 2009, PLoS Comput. Biol..

[70]  S. O’Rahilly,et al.  Human genetics illuminates the paths to metabolic disease , 2009, Nature.

[71]  C. Ouzounis,et al.  Expansion of the BioCyc collection of pathway/genome databases to 160 genomes , 2005, Nucleic acids research.

[72]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[73]  Dietrich Rebholz-Schuhmann,et al.  Biomedical Semantics: the Hub for Biomedical Research 2.0 , 2010, J. Biomed. Semant..

[74]  Ibrahim Emam,et al.  ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments , 2010, Nucleic Acids Res..

[75]  Adrian Paschke,et al.  A journey to Semantic Web query federation in the life sciences , 2009, BMC Bioinformatics.

[76]  Henry S. Rzepa,et al.  SemanticEye: A Semantic Web Application to Rationalize and Enhance Chemical Electronic Publishing. , 2007 .

[77]  Dietrich Rebholz-Schuhmann,et al.  Calbc Silver Standard Corpus , 2010, J. Bioinform. Comput. Biol..

[78]  René Witte,et al.  Enhanced semantic access to the protein engineering literature using ontologies populated by text mining , 2007, Int. J. Bioinform. Res. Appl..

[79]  Peter Woollard,et al.  Towards virtual knowledge broker services for semantic integration of life science literature and data sources. , 2013, Drug discovery today.

[80]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[81]  Christopher D. Town,et al.  SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services , 2009, BMC Bioinformatics.

[82]  David M Nathan,et al.  Individualizing therapies in type 2 diabetes mellitus based on patient characteristics: what we know and what we need to know. , 2010, The Journal of clinical endocrinology and metabolism.

[83]  Dietrich Rebholz-Schuhmann,et al.  The BioLexicon: a large-scale terminological resource for biomedical text mining , 2011, BMC Bioinformatics.

[84]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[85]  Dietrich Rebholz-Schuhmann,et al.  EBIMed - text crunching to gather facts for proteins from Medline , 2007, Bioinform..

[86]  Kei-Hoi Cheung,et al.  Advancing translational research with the Semantic Web , 2007, BMC Bioinformatics.

[87]  D. Steiner,et al.  Erratum to: Clinical and molecular genetics of neonatal diabetes due to mutations in the insulin gene , 2011, Reviews in Endocrine and Metabolic Disorders.

[88]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.