Design of a data model for developing laboratory information management and analysis systems for protein production

Data management has emerged as one of the central issues in the high‐throughput processes of taking a protein target sequence through to a protein sample. To simplify this task, and following extensive consultation with the international structural genomics community, we describe here a model of the data related to protein production. The model is suitable for both large and small facilities for use in tracking samples, experiments, and results through the many procedures involved. The model is described in Unified Modeling Language (UML). In addition, we present relational database schemas derived from the UML. These relational schemas are already in use in a number of data management projects. Proteins 2005. © 2004 Wiley‐Liss, Inc.

[1]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[2]  W. Raub From the National Institutes of Health. , 1990, JAMA.

[3]  George N. Phillips,et al.  Project management system for structural and functional proteomics: Sesame , 2004, Journal of Structural and Functional Genomics.

[4]  S. Gruvberger,et al.  BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data , 2002, Genome Biology.

[5]  Peter Briggs,et al.  A graphical user interface to the CCP4 program suite. , 2003, Acta crystallographica. Section D, Biological crystallography.

[6]  Emmanuel Barillot,et al.  XML, bioinformatics and data integration , 2001, Bioinform..

[7]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[8]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[9]  Mark Harris,et al.  Xtrack - a web-based crystallographic notebook. , 2002, Acta crystallographica. Section D, Biological crystallography.

[10]  Alvis Brazma,et al.  On the Importance of Standardisation in Life Sciences , 2001, Bioinform..

[11]  J Thornton,et al.  Structural genomics takes off. , 2001, Trends in biochemical sciences.

[12]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[13]  T. N. Bhat,et al.  The CCPN project: an interim report on a data model for the NMR community , 2002, Nature Structural Biology.

[14]  Matthew J. Stephens,et al.  Labrat LIMS: an extensible framework for developing laboratory information management, analysis, and bioinformatics solutions for microarrays , 2003, SAC '03.

[15]  H Jhoti High-throughput structural proteomics using x-rays. , 2001, Trends in biotechnology.

[16]  Rolf Apweiler,et al.  The Proteomics Standards Initiative , 2003, Proteomics.

[17]  Nathan Goodman,et al.  LabBase: managing lab data in a large-scale genome-mapping project , 1995 .

[18]  Chris F. Taylor,et al.  A systematic approach to modeling, capturing, and disseminating proteomics experimental data , 2003, Nature Biotechnology.

[19]  Bertrand Meyer,et al.  Object-Oriented Software Construction, 2nd Edition , 1997 .

[20]  Cheryl H. Arrowsmith,et al.  Protein production: feeding the crystallographers and NMR spectroscopists , 2000, Nature Structural Biology.

[21]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[22]  Philip E. Bourne,et al.  STAR/mmCIF: An ontology for macromolecular structure , 2000, Bioinform..

[23]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[24]  Philip E. Bourne,et al.  The Macromolecular Crystallographic Information File (mmCIF) , 2001 .

[25]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[26]  Mark Gerstein,et al.  SPINE 2: a system for collaborative structural proteomics within a federated database framework. , 2003, Nucleic acids research.

[27]  G Avery,et al.  Implementing LIMS: a "how-to" guide. , 2000, Analytical chemistry.

[28]  George H Avery,et al.  Product Review: Implementing LIMS: A “how-to” guide. , 2000 .

[29]  R J Read,et al.  Crystallography & NMR system: A new software suite for macromolecular structure determination. , 1998, Acta crystallographica. Section D, Biological crystallography.

[30]  Philip E. Bourne,et al.  [30] Macromolecular crystallographic information file , 1997 .

[31]  Hilla Peretz,et al.  The , 1966 .

[32]  E N Baker,et al.  LISA: an intranet-based flexible database for protein crystallography project management. , 2001, Acta crystallographica. Section D, Biological crystallography.

[33]  Collaborative Computational,et al.  The CCP4 suite: programs for protein crystallography. , 1994, Acta crystallographica. Section D, Biological crystallography.

[34]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[35]  簡聰富,et al.  物件導向軟體之架構(Object-Oriented Software Construction)探討 , 1989 .

[36]  Jason E. Stewart,et al.  Design and implementation of microarray gene expression markup language (MAGE-ML) , 2002, Genome Biology.

[37]  Chris Morris,et al.  MOLE: A data management application based on a protein production data model , 2005, Proteins.

[38]  T. Earnest,et al.  From words to literature in structural proteomics , 2003, Nature.

[39]  David I. Stuart,et al.  A procedure for setting up high-throughput nanolitre crystallization experiments. I. Protocol design and validation , 2003 .