The effects of context on data quality in biomedical data reuse

Introduction The collection of huge stores of electronically-formatted information and advances in information processing technologies has led to dramatic changes in the conduct of biomedical science. In biology, a paradigm shift is underway due to an unprecedented flood of data, the emergence of shared research repositories, and advances in the application of data mining algorithms. As a result, the traditional model of scientific discovery of “formulate hypothesis, conduct experiment, evaluate results” is being replaced with “collect and store data, mine for new hypotheses, confirm with data or supplemental experiment” (Han et al. 2002). In clinical medicine, similar developments have made possible a variety of secondary applications of extant clinical data, including physician decision support, outcomes assessment, document retrieval, and clinician performance evaluation. The quality, or usefulness, of existing data for secondary uses has thus far been approached with a focus on issues of technical access and the mathematical format of data. Largely ignored in these efforts are the effects on quality introduced when data captured for one purpose is reused for another. This panel brings together four researchers with on going studies in the biomedical domain focused on the effects of context on the quality of data for secondary uses. Drawing from empirical evidence the topics to be addressed include; 1) the effects of context on data reuse, 2) anticipating, identifying and accounting for these effects, and 3) the future implications of large-scale data reuse for information professionals in the biomedical domain.