Data governance in predictive toxicology: A review

BackgroundDue to recent advances in data storage and sharing for further data processing in predictive toxicology, there is an increasing need for flexible data representations, secure and consistent data curation and automated data quality checking. Toxicity prediction involves multidisciplinary data. There are hundreds of collections of chemical, biological and toxicological data that are widely dispersed, mostly in the open literature, professional research bodies and commercial companies. In order to better manage and make full use of such large amount of toxicity data, there is a trend to develop functionalities aiming towards data governance in predictive toxicology to formalise a set of processes to guarantee high data quality and better data management. In this paper, data quality mainly refers in a data storage sense (e.g. accuracy, completeness and integrity) and not in a toxicological sense (e.g. the quality of experimental results).ResultsThis paper reviews seven widely used predictive toxicology data sources and applications, with a particular focus on their data governance aspects, including: data accuracy, data completeness, data integrity, metadata and its management, data availability and data authorisation. This review reveals the current problems (e.g. lack of systematic and standard measures of data quality) and desirable needs (e.g. better management and further use of captured metadata and the development of flexible multi-level user access authorisation schemas) of predictive toxicology data sources development. The analytical results will help to address a significant gap in toxicology data quality assessment and lead to the development of novel frameworks for predictive toxicology data and model governance.ConclusionsWhile the discussed public data sources are well developed, there nevertheless remain some gaps in the development of a data governance framework to support predictive toxicology. In this paper, data governance is identified as the new challenge in predictive toxicology, and a good use of it may provide a promising framework for developing high quality and easy accessible toxicity data repositories. This paper also identifies important research directions that require further investigation in this area.

[1]  Michael D. Waters,et al.  Toxicogenomics and systems toxicology: aims and prospects , 2004, Nature Reviews Genetics.

[2]  Steve Sarsfield,et al.  The Data Governance Imperative , 2009 .

[3]  Rik Maes,et al.  International Journal of Information Management on the Governance of Information: Introducing a New Concept of Governance to Support the Management of Information , 2022 .

[4]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[5]  Thomas C. Wiegers,et al.  Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical–gene–disease networks , 2008, Nucleic Acids Res..

[6]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[7]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2013 , 2012, Nucleic Acids Res..

[8]  David M. Reif,et al.  In Vitro Screening of Environmental Chemicals for Targeted Testing Prioritization: The ToxCast Project , 2009, Environmental health perspectives.

[9]  Carol V. Brown,et al.  Designing data governance , 2010, CACM.

[10]  Michael C. Rosenstein,et al.  The comparative toxicogenomics database: a cross-species resource for building chemical-gene interaction networks. , 2006, Toxicological sciences : an official journal of the Society of Toxicology.

[11]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2011 , 2010, Nucleic Acids Res..

[12]  Pantelis Sopasakis,et al.  Collaborative development of predictive toxicology applications , 2010, J. Cheminformatics.

[13]  L. Diamond IT Governance : How Top Performers Manage IT Decision Rights for Superior Results , 2005 .

[14]  Michael D Waters,et al.  Database development in toxicogenomics: issues and efforts. , 2004, Environmental health perspectives.

[15]  Nina Nikolova-Jeliazkova,et al.  QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review , 2005, Alternatives to laboratory animals : ATLA.

[16]  J. Jaworska,et al.  Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. , 2003, Environmental health perspectives.

[17]  Judith C. Madden,et al.  In Silico Toxicology , 2010 .

[18]  Kristin Wende,et al.  A Model for Data Governance - Organising Accountabilities for Data Quality Management , 2007 .

[19]  Richard Judson,et al.  Public Databases Supporting Computational Toxicology , 2010, Journal of toxicology and environmental health. Part B, Critical reviews.

[20]  R. Judson,et al.  The Toxicity Data Landscape for Environmental Chemicals , 2008, Environmental health perspectives.

[21]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[22]  Pierre R. Bushel,et al.  CEBS—Chemical Effects in Biological Systems: a public data repository integrating study design and toxicity data with microarray and proteomics data , 2007, Nucleic Acids Res..

[23]  Helmut Segner,et al.  Data quality assessment for in silico methods: A survey of approaches and needs , 2010 .

[24]  D. Dix,et al.  The ToxCast program for prioritizing toxicity testing of environmental chemicals. , 2007, Toxicological sciences : an official journal of the Society of Toxicology.

[25]  Susan Hester,et al.  Toward a checklist for exchange and interpretation of data from a toxicology study. , 2007, Toxicological sciences : an official journal of the Society of Toxicology.

[26]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[27]  K. Bretonnel Cohen,et al.  Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) , 2009, BMC Bioinformatics.

[28]  Ann Richard,et al.  ACToR--Aggregated Computational Toxicology Resource. , 2008, Toxicology and applied pharmacology.

[29]  C. Mattingly,et al.  The Comparative Toxicogenomics Database (CTD). , 2003, Environmental health perspectives.