Conceptual data modelling for bioinformatics

Current research in the biosciences depends heavily on the effective exploitation of huge amounts of data. These are in disparate formats, remotely dispersed, and based on the different vocabularies of various disciplines. Furthermore, data are often stored or distributed using formats that leave implicit many important features relating to the structure and semantics of the data. Conceptual data modelling involves the development of implementation-independent models that capture and make explicit the principal structural properties of data. Entities such as a biopolymer or a reaction, and their relations, eg catalyses, can be formalised using a conceptual data model. Conceptual models are implementation-independent and can be transformed in systematic ways for implementation using different platforms, eg traditional database management systems. This paper describes the basics of the most widely used conceptual modelling notations, the ER (entity-relationship) model and the class diagrams of the UML (unified modelling language), and illustrates their use through several examples from bioinformatics. In particular, models are presented for protein structures and motifs, and for genomic sequences.

[1]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[2]  Michael R. Blaha,et al.  Object-Oriented Modeling and Design for Database Applications , 1997 .

[3]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: an online compilation of relevant database resources , 2000, Nucleic Acids Res..

[4]  Jian Hu,et al.  Design and implementation of a CORBA-based genome mapping system prototype , 1998, Bioinform..

[5]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[6]  Carole A. Goble,et al.  Conceptual modelling of genomic information , 2000, Bioinform..

[7]  Ivar Jacobson,et al.  The Unified Modeling Language User Guide , 1998, J. Database Manag..

[8]  Jason E. Robbins,et al.  Argo: A Design Environment for Evolving Software Architectures , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[9]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[10]  R. G. G. Cattell,et al.  The Object Database Standard: ODMG-93 , 1993 .

[11]  Laurian M. Chirica,et al.  The entity-relationship model: toward a unified view of data , 1975, SIGF.

[12]  Grady Booch,et al.  Object-Oriented Design with Applications , 1990 .

[13]  G J Kemp,et al.  An object-oriented database for protein structure analysis. , 1990, Protein engineering.

[14]  Terri K. Attwood,et al.  PRINTS-S: the database formerly known as PRINTS , 2000, Nucleic Acids Res..

[15]  강문설 [서평]「The Unified Modeling Language User Guide」 , 1999 .