Semantic Web-Based Integration of Cancer Pathways and Allele Frequency Data

We demonstrate the use of Semantic Web technology to integrate the ALFRED allele frequency database and the Starpath pathway resource. The linking of population-specific genotype data with cancer-related pathway data is potentially useful given the growing interest in personalized medicine and the exploitation of pathway knowledge for cancer drug discovery. We model our data using the Web Ontology Language (OWL), drawing upon ideas from existing standard formats BioPAX for pathway data and PML for allele frequency data. We store our data within an Oracle database, using Oracle Semantic Technologies. We then query the data using Oracle's rule-based inference engine and SPARQL-like RDF query language. The ability to perform queries across the domains of population genetics and pathways offers the potential to answer a number of cancer-related research questions. Among the possibilities is the ability to identify genetic variants which are associated with cancer pathways and whose frequency varies significantly between ethnic groups. This sort of information could be useful for designing clinical studies and for providing background data in personalized medicine. It could also assist with the interpretation of genetic analysis results such as those from genome-wide association studies.

[1]  Kei-Hoi Cheung,et al.  ALFRED: an allele frequency database for diverse populations and DNA polymorphisms , 2000, Nucleic Acids Res..

[2]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[3]  Zhe Wu,et al.  Implementing an Inference Engine for RDFS/OWL Constructs and User-Defined Rules in Oracle , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  M. Skipper Allele Frequency Database , 2003, Nature Reviews Genetics.

[5]  Brian McBride,et al.  Jena: A Semantic Web Toolkit , 2002, IEEE Internet Comput..

[6]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[7]  Elizabeth M. Smigielski,et al.  dbSNP: a database of single nucleotide polymorphisms , 2000, Nucleic Acids Res..

[8]  Andreas Abecker,et al.  Semantic Web Services: Concepts, Technologies, and Applications , 2010 .

[9]  Gudmundur A. Thorisson,et al.  The International HapMap Project Web site. , 2005, Genome research.

[10]  Steven C. Lawlor,et al.  GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways , 2002, Nature Genetics.

[11]  Kei-Hoi Cheung,et al.  ALFRED: the ALelle FREquency Database. Update , 2003, Nucleic Acids Res..

[12]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[13]  Mark Ellisman,et al.  e-Neuroscience: challenges and triumphs in integrating distributed data from molecules to brains , 2004, Nature Neuroscience.

[14]  Sewall Wright,et al.  Variability within and among natural populations , 1978 .

[15]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[16]  Henrik Eriksson,et al.  The evolution of Protégé: an environment for knowledge-based systems development , 2003, Int. J. Hum. Comput. Stud..

[17]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[18]  Sewall Wright,et al.  The theory of gene frequencies , 1969 .

[19]  Harding Rosalind M.,et al.  Human genome diversity—a Project? , 1998, Nature Genetics.

[20]  Carole A. Goble,et al.  TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources , 1998, ISMB.

[21]  R. N. Curnow,et al.  Evolution and the Genetics of Populations, Volume 4: Variability Within and Among Natural Populations , 1978 .

[22]  Michel Dumontier,et al.  yOWL: An ontology-driven knowledge base for yeast biologists , 2008, J. Biomed. Informatics.

[23]  R. Fox,et al.  A View from the Web , 1997, Journal of the Royal Society of Medicine.

[24]  Joel H. Saltz,et al.  caGrid: design and implementation of the core architecture of the cancer biomedical informatics grid , 2006, Bioinform..

[25]  Kei-Hoi Cheung,et al.  ALFRED - the ALlele FREquency Database , 2003 .

[26]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[27]  S. Wright Evolution and the Genetics of Populations, Volume 3: Experimental Results and Evolutionary Deductions , 1977 .

[28]  Rui Mei,et al.  Large-scale SNP analysis reveals clustered and continuous patterns of human genetic variation , 2005, Human Genomics.

[29]  Mansur R. Kabuka,et al.  Model Formulation: semCDI: A Query Formulation for Semantic Data Integration in caBIG , 2008, J. Am. Medical Informatics Assoc..

[30]  José L. V. Mejino,et al.  A reference ontology for biomedical informatics: the Foundational Model of Anatomy , 2003, J. Biomed. Informatics.

[31]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[32]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[33]  W. Ewens Evolution and the Genetics of Populations. Vol. 2, The Theory of Gene Frequencies. Sewall Wright. University of Chicago Press, Chicago, 1969. viii + 512 pp., illus. $15 , 1970 .

[34]  Carole A. Goble,et al.  State of the nation in data integration for bioinformatics , 2008, J. Biomed. Informatics.

[35]  Frank van Harmelen,et al.  A semantic web primer , 2004 .

[36]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[37]  Jennifer Golbeck,et al.  Modeling a description logic vocabulary for cancer research , 2005, J. Biomed. Informatics.

[38]  Kerry K Kakazu,et al.  The Cancer Biomedical Informatics Grid (caBIG): pioneering an expansive network of information and tools for collaborative cancer research. , 2004, Hawaii medical journal.

[39]  I. Kullo,et al.  Abstract 4110: Allele Frequencies of Functional SNPs in Candidate Genes for Cardiovascular Disease Differ by Ethnicity , 2006 .

[40]  S. Wright,et al.  Evolution and the Genetics of Populations: Volume 2, The Theory of Gene Frequencies , 1968 .

[41]  Amit P. Sheth,et al.  An ontology-driven semantic mashup of gene and biological pathway information: Application to the domain of nicotine dependence , 2008, J. Biomed. Informatics.

[42]  Daniel L. Rubin,et al.  Biomedical ontologies: a functional perspective , 2007, Briefings Bioinform..

[43]  E. H. Mercer The cancer cell. , 1962, British medical bulletin.

[44]  Christopher G. Chute,et al.  Cancer Informatics , 2002, Health Informatics.

[45]  Lincoln Stein,et al.  Gramene: a growing plant comparative genomics resource , 2007, Nucleic Acids Res..