Model-Based Engineering Applied to the Interpretation of the Human Genome

In modern software engineering it is widely accepted that the use of Conceptual Modeling techniques provides an accurate description of the problem domain. Applying these techniques before developing their associated software representation (implementations) allows for the development of high quality software systems. The application of these ideas to new, challenging domains -as the one provided by the modern Genomics- is a fascinating task. In particular, this chapter shows how the complexity of human genome interpretation can be faced from a pure conceptual modeling perspective to describe and understand it more clearly and precisely. With that, we pretend to show that a conceptual schema of the human genome will allow us to better understand the functional and structural relations that exist between the genes and the DNA translation and transcription processes, intended to explain the protein synthesis. Genome, genes, alleles, genic mutations... all these concepts should be properly specified through the creation of the corresponding Conceptual Schema, and the result of these efforts is presented here. First, an initial conceptual schema is suggested. It includes a first version of the basic genomic notions intended to define those basic concepts that characterize the description of the Human Genome. A set of challenging concepts is detected: they refer to representations that require a more detailed specification. As the knowledge about the domain increases, the model evolution is properly introduced and justified, with the final intention of obtaining a stable, final version for the Conceptual Schema of the Human Genome. During this process, the more critical concepts are outlined, and the final decision adopted to model them adequately is discussed. Having such a Conceptual Schema enables the creation of a corresponding data base. This database could include the required contents needed to exploit bio-genomic information in the structured and precise way historically provided by the Database domains. That strategy is far from the current biological data source ontologies that are heterogeneous, imprecise and too often even inconsistent.