Human genome conceptual modeling: An ontological framework for the design and implementation of genomic Information Systems

The objective of the work presented in this paper is to design and develop an Information System that integrates genome information currently scattered in different repositories. The comprehension of biological and concretely genomic concepts is an extremely attractive research topic due to the needs of experts. The system has been developed following a conceptual-model and ontological description based methodology. This work provides a conceptual model to represent, in a formal way, genome knowledge and a ontological description with high-level detail to describe some aspects that conceptual modeling can not cover. To keep information regarding the changes to an evolving environment a versioning system is necessary. On the other hand when the conceptual model is established, it is implemented in a database. The database acts as a unified repository of integrated information that will allow biologists to perform efficient recovery tasks. Lastly, a loading module will be implemented, using an ETL (extraction, transformation and load) strategy, in order to integrate data from relevant variation repositories.

[1]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[2]  Oscar Pastor,et al.  Model-driven architecture in practice - a software production environment based on conceptual modeling , 2007 .

[3]  Oscar Pastor,et al.  Facing the Challenges of Genome Information Systems: a Variation Analysis Prototype , 2010, CAiSE Forum.

[4]  Cui Tao,et al.  Seed-Based Generation of Personalized Bio-ontologies for Information Extraction , 2007, ER Workshops.

[5]  Robert Hoehndorf,et al.  General Formal Ontology (GFO) - A Foundational Ontology Integrating Objects and Processes [Version 1.0] , 2006 .

[6]  James P. Turley,et al.  Conceptual models: Definitions, construction, and applications in public health surveillance , 2006, Journal of Urban Health.

[7]  Kei-Hoi Cheung,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.

[8]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[9]  S. Antonarakis,et al.  Nomenclature for the description of human sequence variations , 2001, Human Genetics.

[10]  J. Vadgama,et al.  BRCA1 and BRCA2 gene mutation analysis: visit to the Breast Cancer Information Core (BIC). , 1999, Oncology research.

[11]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[12]  I. Fokkema,et al.  LOVD: Easy creation of a locus‐specific sequence variation database using an “LSDB‐in‐a‐box” approach , 2005, Human mutation.

[13]  Oscar Pastor,et al.  Model-Based Engineering Applied to the Interpretation of the Human Genome , 2008, The Evolution of Conceptual Modeling.

[14]  Carole A. Goble,et al.  Conceptual modelling of genomic information , 2000, Bioinform..

[15]  Oscar Pastor,et al.  Enforcing Conceptual Modeling to improve the understanding of human genome , 2010, 2010 Fourth International Conference on Research Challenges in Information Science (RCIS).

[16]  Oscar Pastor,et al.  Conceptual Modeling Meets the Human Genome , 2008, ER.

[17]  Han Min Wong,et al.  e-Fungi: a data resource for comparative analysis of fungal genomes , 2007, BMC Genomics.

[18]  Ana M. Levin,et al.  Gene Ontology Based Automated Annotation: Why It Isn't Working , 2011, ER Workshops.

[19]  Eckhard D. Falkenberg,et al.  On a framework of information systems concepts , 1990 .

[20]  Matilde Celma,et al.  Integrating Human Genome Variation Data: An Information System Approach , 2011, 2011 22nd International Workshop on Database and Expert Systems Applications.

[21]  Alexander Borgida,et al.  Conceptual Modeling of Information Systems , 1985, On Knowledge Base Management Systems.

[22]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..