Data Management Practices for Collaborative Research

The success of research in the field of maternal–infant health, or in any scientific field, relies on the adoption of best practices for data and knowledge management. Prior work by our group and others has identified evidence-based solutions to many of the data management challenges that exist, including cost–effective practices for ensuring high-quality data entry and proper construction and maintenance of data standards and ontologies. Quality assurance practices for data entry and processing are necessary to ensure that data are not denigrated during processing, but the use of these practices has not been widely adopted in the fields of psychology and biology. Furthermore, collaborative research is becoming more common. Collaborative research often involves multiple laboratories, different scientific disciplines, numerous data sources, large data sets, and data sets from public and commercial sources. These factors present new challenges for data and knowledge management. Data security and privacy concerns are increased as data may be accessed by investigators affiliated with different institutions. Collaborative groups must address the challenges associated with federating data access between the data-collecting sites and a centralized data management site. The merging of ontologies between different data sets can become formidable, especially in fields with evolving ontologies. The increased use of automated data acquisition can yield more data, but it can also increase the risk of introducing error or systematic biases into data. In addition, the integration of data collected from different assay types often requires the development of new tools to analyze the data. All of these challenges act to increase the costs and time spent on data management for a given project, and they increase the likelihood of decreasing the quality of the data. In this paper, we review these issues and discuss theoretical and practical approaches for addressing these issues.

[1]  David W. Zeitler Introduction to Quality Control , 1994 .

[2]  Philip Carson,et al.  Good clinical, laboratory and manufacturing practices : techniques for the QA professional , 2007 .

[3]  Sergi G. Costafreda,et al.  Pooling fMRI Data: Meta-Analysis, Mega-Analysis and Multi-Center Studies , 2009, Front. Neuroinform..

[4]  Xiao-Hua Zhou,et al.  Statistical Methods for Meta‐Analysis , 2008 .

[5]  Kathryn M. McMillan,et al.  A comparison of label‐based review and ALE meta‐analysis in the Stroop task , 2005, Human brain mapping.

[6]  Kara L. Hall,et al.  The science of team science: overview of the field and introduction to the supplement. , 2008, American journal of preventive medicine.

[7]  Manhong Dai,et al.  Cross-domain neurobiology data integration and exploration , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[8]  Margaret Burchinal,et al.  I. DATA MANAGEMENT: RECOMMENDED PRACTICES , 2006 .

[9]  A. Meyer-Lindenberg,et al.  Neural substrates of pleiotropic action of genetic variation in COMT: a meta-analysis , 2010, Molecular Psychiatry.

[10]  David B. Searls,et al.  Data integration: challenges for drug discovery , 2005, Nature Reviews Drug Discovery.

[11]  Michael Stonebraker,et al.  SQL databases v. NoSQL databases , 2010, CACM.

[12]  James Frew,et al.  Lineage retrieval for scientific data processing: a survey , 2005, CSUR.

[13]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[14]  Brian D. O'Connor,et al.  SeqWare Query Engine: storing and searching sequence data in the cloud , 2010, BMC Bioinformatics.

[15]  Subpart A—general Provisions NONCLINICAL LABORATORY STUDIES , 2000 .

[16]  Yogesh L. Simmhan,et al.  A survey of data provenance techniques , 2005 .

[17]  Benjamin F. Jones,et al.  Supporting Online Material Materials and Methods Figs. S1 to S3 References the Increasing Dominance of Teams in Production of Knowledge , 2022 .