A Robust, Differentially Private Randomized Experiment for Evaluating Online Educational Programs With Sensitive Student Data (preprint)/ en

Randomized control trials (RCTs) have been the gold standard to evaluate the effectiveness of a program, policy, or treatment on an outcome of interest. However, many RCTs assume that study participants are willing to share their (potentially sensitive) data, specifically their response to treatment. This assumption, while trivial at first, is becoming difficult to satisfy in the modern era, especially in online settings where there are more regulations to protect individuals’ data. The paper presents a new, simple experimental design that is differentially private, one of the strongest notions of data privacy. Also, using works on noncompliance in experimental psychology, we show that our design is robust against “adversarial” participants who may distrust investigators with their personal data and provide contaminated responses to intentionally bias the results of the experiment. Under our new design, we propose unbiased and asymptotically Normal estimators for the average treatment effect. We also present a doubly robust, covariate-adjusted estimator that uses pretreatment covariates (if available) to improve efficiency. We conclude by using the proposed experimental design to evaluate the effectiveness of online statistics courses at the University of Wisconsin-Madison during the Spring 2021 semester, where many classes were online due to COVID-19.

[1]  Jiahui Wang,et al.  Instructor presence in instructional video: Effects on visual attention, recall, and perceived learning , 2017, Comput. Hum. Behav..

[2]  Cameron Marlow,et al.  A 61-million-person experiment in social influence and political mobilization , 2012, Nature.

[3]  Úlfar Erlingsson,et al.  Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[4]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[5]  Marie Davidian,et al.  Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates , 2008, Biometrics.

[6]  Răzvan Viorescu 2018 REFORM OF EU DATA PROTECTION RULES , 2017 .

[7]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[8]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[10]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[11]  Jianzhong Hong,et al.  Learning process and learning outcomes of video podcasts including the instructor and PPT slides: a Chinese case , 2016 .

[12]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[13]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[14]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[15]  M. Davidian,et al.  Covariate adjustment for two‐sample treatment comparisons in randomized clinical trials: A principled yet flexible approach , 2008, Statistics in medicine.

[16]  D. V. Lindley,et al.  Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[17]  Paul E. Tracy,et al.  Measuring associations with randomized response , 1984 .

[18]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[19]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[20]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[21]  Cynthia Dwork,et al.  Differential Privacy for Statistics: What we Know and What we Want to Learn , 2010, J. Priv. Confidentiality.

[22]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[23]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[24]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .

[26]  Robert A. Desharnais,et al.  Honest Answers to Embarrassing Questions: Detecting Cheating in the Randomized Response Model , 1998 .

[27]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[28]  René F. Kizilcec,et al.  Showing face in video instruction: effects on information retention, visual attention, and affect , 2014, CHI.

[29]  G. Annas HIPAA regulations - a new era of medical-record privacy? , 2003, The New England journal of medicine.

[30]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[31]  Daniel Smilek,et al.  Instructor presence effect: Liking does not always lead to learning , 2018, Comput. Educ..

[32]  Kara M. Dawson,et al.  Does visual attention to the instructor in online video affect learning and learner perceptions? An eye-tracking analysis , 2020, Comput. Educ..

[33]  Raquel Benbunan-Fich,et al.  The ethics of online research with unsuspecting users: From A/B testing to C/D experimentation , 2017 .

[34]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[35]  Yin Yang,et al.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy , 2016, CCS.

[36]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.