Integrating Human Genome Variation Data: An Information System Approach

The goal of this work is to design and develop an Information System that integrates human genome variation data currently scattered in different repositories. The continuous and increasing interest generated around the variations knowledge, makes the study of this research topic from an Information System point of view extremely attractive. The system has been developed following a conceptual-model based methodology. The conceptual model represents, in a formal way, genome variation knowledge. The definition and categorization of variations is unified using this conceptualization. Once this conceptual model is established, it is implemented in a database (Human Genome Data Base, HGDB). The database acts as a unified variation repository of integrated information that will allow biologists to perform efficient recovery tasks. Lastly, a loading module has been implemented, using an extraction-transformation-load (ETL) strategy, in order to integrate data from three relevant variation repositories: HapMap, Ensembl and Cosmic. An exploitation module for final users is also provided.

[1]  Francis S Collins,et al.  A HapMap harvest of insights into the genetics of common disease. , 2008, The Journal of clinical investigation.

[2]  Mingming Jia,et al.  COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer , 2009, Nucleic Acids Res..

[3]  Oscar Pastor,et al.  Model-driven architecture in practice - a software production environment based on conceptual modeling , 2007 .

[4]  Eckhard D. Falkenberg,et al.  On a framework of information systems concepts , 1990 .

[5]  Alexander Borgida,et al.  Conceptual Modeling of Information Systems , 1985, On Knowledge Base Management Systems.

[6]  Cui Tao,et al.  Seed-Based Generation of Personalized Bio-ontologies for Information Extraction , 2007, ER Workshops.

[7]  Michael Stonebraker,et al.  SQL databases v. NoSQL databases , 2010, CACM.

[8]  Oscar Pastor,et al.  Model-Based Engineering Applied to the Interpretation of the Human Genome , 2008, The Evolution of Conceptual Modeling.

[9]  Han Min Wong,et al.  e-Fungi: a data resource for comparative analysis of fungal genomes , 2007, BMC Genomics.

[10]  Ewan Birney,et al.  Ensembl Genome Browser , 2010 .

[11]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[12]  Carole A. Goble,et al.  Conceptual modelling of genomic information , 2000, Bioinform..

[13]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[14]  Kei-Hoi Cheung,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.