Explorations into the Provenance of High Throughput Biomedical Experiments

The field of translational biomedical informatics seeks to integrate knowledge from basic science, directed research into diseases, and clinical insights into a form that can be used to discover effective treatments of diseases. We demonstrate methods and tools to generate RDF representations of a commonly used experimental description format, MAGE-TAB, mappings of MAGE documents to two general-purpose provenance representations, OPM (Open Provenance Model) and PML (Proof Markup Language). We show through a use case simulation that the data represented in MAGE documents can be completely represented in OPM and PML through use of round trip analysis of certain examples. The success in mapping MAGE documents into general-purpose provenance models shows that promise in the implementation of the translational research provenance vision.

[1]  Stephen B. Johnson,et al.  Breaking the Translational Barriers: The Value of Integrating Biomedical Informatics and Translational Research , 2005, Journal of Investigative Medicine.

[2]  Deborah L. McGuinness,et al.  Provenance of High Throughput Biomedical Experiments , 2010 .

[3]  Deborah L. McGuinness,et al.  PML 2: A Modular Explanation Interlingua , 2007, ExaCt.

[4]  Todd H. Stokes,et al.  ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses , 2008, BMC Bioinformatics.

[5]  Ibrahim Emam,et al.  ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression , 2008, Nucleic Acids Res..

[6]  Yogesh L. Simmhan,et al.  The Open Provenance Model (v1.01) , 2008 .

[7]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[8]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[9]  Simon Miles Automatically Adapting Source Code to Document Provenance , 2010, IPAW.

[10]  Jules J. Berman,et al.  The tissue microarray data exchange specification: A community-based, open source tool for sharing tissue microarray data , 2003, BMC Medical Informatics Decis. Mak..

[11]  Carole A. Goble,et al.  Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements , 2008, IPAW.

[12]  E. Zerhouni Translational and clinical science--time for a new vision. , 2005, The New England journal of medicine.

[13]  Yolanda Gil,et al.  Provenance trails in the Wings-Pegasus system , 2008 .

[14]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[15]  Luc Moreau,et al.  The Foundations of the Open Provenance Model , 2009 .

[16]  Paul T. Spellman,et al.  A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB , 2006, BMC Bioinformatics.

[17]  Michal Antkiewicz,et al.  Framework-Specific modeling languages with round-trip engineering , 2006, MoDELS'06.

[18]  Kevin Lano,et al.  Slicing of UML models using model transformations , 2010, MODELS'10.

[19]  Juli D. Klemm,et al.  Data submission and curation for caArray, a standard based microarray data repository system , 2009 .

[20]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[21]  Cláudio T. Silva,et al.  Tackling the Provenance Challenge one layer at a time , 2008 .

[22]  J. Karam,et al.  Methods in Nucleic Acids Research , 1990 .

[23]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[24]  E. Zerhouni US biomedical research: basic, translational, and clinical sciences. , 2005, JAMA.

[25]  Carole A. Goble,et al.  Mining Taverna's semantic web of provenance , 2008, Concurr. Comput. Pract. Exp..

[26]  Richard Fikes,et al.  The Ontolingua Server: a tool for collaborative ontology construction , 1997, Int. J. Hum. Comput. Stud..

[27]  Chris F. Taylor,et al.  The MGED Ontology: a resource for semantics-based description of microarray experiments , 2006, Bioinform..

[28]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.