The DISTANCE model for collaborative research: distributing analytic effort using scrambled data sets.

BACKGROUND Data-sharing is encouraged to fulfill the ethical responsibility to transform research data into public health knowledge, but data sharing carries risks of improper disclosure and potential harm from release of individually identifiable data. METHODS The study objective was to develop and implement a novel method for scientific collaboration and data sharing which distributes the analytic burden while protecting patient privacy. A procedure was developed where in an investigator who is external to an analytic coordinating center (ACC) can conduct original research following a protocol governed by a Publications and Presentations (P&P) Committee. The collaborating investigator submits a study proposal and, if approved, develops the analytic specifications using existing data dictionaries and templates. An original data set is prepared according to the specifications and the external investigator is provided with a complete but de-identified and shuffled data set which retains all key data fields but which obfuscates individually identifiable data and patterns; this" scrambled data set" provides a "sandbox" for the external investigator to develop and test analytic code for analyses. The analytic code is then run against the original data at the ACC to generate output which is used by the external investigator in preparing a manuscript for journal submission. RESULTS The method has been successfully used with collaborators to produce many published papers and conference reports. CONCLUSION By distributing the analytic burden, this method can facilitate collaboration and expand analytic capacity, resulting in more science for less money.

[1]  N. Adler,et al.  Correlates of Patient-Reported Racial/Ethnic Health Care Discrimination in the Diabetes Study of Northern California (DISTANCE) , 2011, Journal of health care for the poor and underserved.

[2]  Romain Neugebauer,et al.  Cohort Profile: The Diabetes Study of Northern California (DISTANCE)--objectives and design of a survey follow-up study of social health disparities in a managed care population. , 2009, International journal of epidemiology.

[3]  J. Pearl,et al.  Causal diagrams for epidemiologic research. , 1999, Epidemiology.

[4]  E. Perez-stable,et al.  Prevalence of Diabetes in Mexican Americans, Cubans, and Puerto Ricans From the Hispanic Health and Nutrition Examination Survey, 1982–1984 , 1991, Diabetes Care.

[5]  N. Adler,et al.  Provider factors and patient-reported healthcare discrimination in the Diabetes Study of California (DISTANCE). , 2011, Patient education and counseling.

[6]  J. D. Miller Sharing clinical research data in the United States under the health insurance portability and accountability act and the privacy rule , 2010, Trials.

[7]  H. Humphrey,et al.  Standards for privacy of individually identifiable health information. , 2003, Health care law monthly.

[8]  D. Altman,et al.  Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers , 2010, BMJ : British Medical Journal.

[9]  Joshua C. Denny,et al.  The disclosure of diagnosis codes can breach research participants' privacy , 2010, J. Am. Medical Informatics Assoc..

[10]  N. Adler,et al.  Patient-Reported Racial/Ethnic Healthcare Provider Discrimination and Medication Intensification in the Diabetes Study of Northern California (DISTANCE) , 2011, Journal of General Internal Medicine.

[11]  Jennifer Y. Liu,et al.  Correlates of Quality of Life in Older Adults With Diabetes , 2011, Diabetes Care.

[12]  Sei J. Lee,et al.  Glycemic control and urinary incontinence in women with diabetes mellitus. , 2013, Journal of women's health.

[13]  Dean Schillinger,et al.  Neighborhood Deprivation and Change in BMI Among Adults With Type 2 Diabetes , 2013, Diabetes Care.

[14]  Rathindra Sarathy,et al.  Data Shuffling - A New Masking Approach for Numerical Data , 2006, Manag. Sci..

[15]  B. Fitzgerald Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule , 2015 .

[16]  N. Adler,et al.  Patient reported interpersonal processes of care and perceived social position: the Diabetes Study of Northern California (DISTANCE). , 2013, Patient education and counseling.

[17]  N. Adler,et al.  Elevated Rates of Diabetes in Pacific Islanders and Asian Subgroups , 2013, Diabetes Care.

[18]  Jennifer Y. Liu,et al.  Symptom Burden of Adults with Type 2 Diabetes Across the Disease Course: Diabetes & Aging Study , 2012, Journal of General Internal Medicine.

[19]  Maggi Kelly,et al.  Obesity and the Food Environment: Income and Ethnicity Differences Among People With Diabetes , 2013, Diabetes Care.

[20]  M. Hernán,et al.  Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. , 2002, American journal of epidemiology.