Privacy-Preserving Learning Analytics: Challenges and Techniques

Educational data contains valuable information that can be harvested through learning analytics to provide new insights for a better education system. However, sharing or analysis of this data introduce privacy risks for the data subjects, mostly students. Existing work in the learning analytics literature identifies the need for privacy and pose interesting research directions, but fails to apply state of the art privacy protection methods with quantifiable and mathematically rigorous privacy guarantees. This work aims to employ and evaluate such methods on learning analytics by approaching the problem from two perspectives: (1) the data is anonymized and then shared with a learning analytics expert, and (2) the learning analytics expert is given a privacy-preserving interface that governs her access to the data. We develop proof-of-concept implementations of privacy preserving learning analytics tasks using both perspectives and run them on real and synthetic datasets. We also present an experimental study on the trade-off between individuals’ privacy and the accuracy of the learning analytics tasks.

[1]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[2]  A. Anonymous,et al.  Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy , 2013, J. Priv. Confidentiality.

[3]  George Siemens,et al.  Penetrating the fog: analytics in learning and education , 2014 .

[4]  Eitel J. M. Lauría,et al.  Early Alert of Academically At-Risk Students: An Open Source Analytics Initiative , 2014, J. Learn. Anal..

[5]  Hendrik Drachsler,et al.  Privacy and analytics: it's a DELICATE issue a checklist for trusted learning analytics , 2016, LAK.

[6]  Dragan Gasevic,et al.  Open Learning Analytics: an integrated modularized platform , 2011 .

[7]  Leah P. Macfadyen,et al.  Embracing Big Data in Complex Educational Systems: The Learning Analytics Imperative and the Policy Challenge. , 2014 .

[8]  Helen Nissenbaum,et al.  Privacy in Context - Technology, Policy, and the Integrity of Social Life , 2009 .

[9]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[10]  Niall Sclater,et al.  Code of practice for learning analytics , 2015 .

[11]  H. Nissenbaum Privacy as contextual integrity , 2004 .

[12]  Seda F. Gürses Can you engineer privacy? , 2014, CACM.

[13]  George Siemens,et al.  Learning analytics and educational data mining: towards communication and collaboration , 2012, LAK.

[14]  Arvind Narayanan,et al.  No silver bullet: De-identification still doesn't work , 2014 .

[15]  Daniel J. Solove,et al.  Introduction: Privacy Self-Management and the Consent Dilemma , 2013 .

[16]  Paul Prinsloo,et al.  Student privacy self-management: implications for learning analytics , 2015, LAK.

[17]  Jennifer Heath,et al.  Contemporary Privacy Theory Contributions to Learning Analytics , 2014, J. Learn. Anal..

[18]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[19]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[20]  Paul Prinsloo,et al.  An evaluation of policy frameworks for addressing ethical considerations in learning analytics , 2013, LAK '13.

[21]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[22]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[23]  Benjamin C. M. Fung,et al.  Anonymizing sequential releases , 2006, KDD '06.

[24]  K. Crawford The Hidden Biases in Big Data , 2013 .

[25]  Chris Clifton,et al.  Thoughts on k-Anonymization , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[26]  Daniel J. Solove A Taxonomy of Privacy , 2006 .

[27]  A. V. Sriharsha,et al.  On Syntactic Anonymity and Differential Privacy , 2015 .

[28]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[29]  D. Boyd,et al.  Six Provocations for Big Data , 2011 .

[30]  H. Humphrey,et al.  Standards for privacy of individually identifiable health information. , 2003, Health care law monthly.

[31]  Basit Shafiq,et al.  Differentially Private Naive Bayes Classification , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[32]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[33]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[34]  M. A. Crook The Risks of Absolute Medical Confidentiality , 2013, Sci. Eng. Ethics.

[35]  Mandy Lupton,et al.  Learning analytics beyond the LMS: the connected learning analytics toolkit , 2015, LAK.

[36]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[37]  Mathieu d'Aquin,et al.  The learning analytics & knowledge (LAK) data challenge 2014 , 2014, LAK '14.

[38]  Chris Clifton,et al.  Multirelational k-Anonymity , 2009, IEEE Trans. Knowl. Data Eng..

[39]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[40]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[41]  Tomas A. Lipinski,et al.  The Digital Person: Technology and Privacy in the Information Age , 2008 .

[42]  George Siemens,et al.  Ethical and privacy principles for learning analytics , 2014, Br. J. Educ. Technol..

[43]  Jenni Swenson,et al.  Establishing an ethical literacy for learning analytics , 2014, LAK.

[44]  P. Prinsloo,et al.  Learning Analytics , 2013 .

[45]  Paul Ohm Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization , 2009 .

[46]  Benjamin Gerber,et al.  Conceptualizing privacy , 2010, CSOC.

[47]  Yücel Saygin,et al.  Privacy-Preserving Publishing of Hierarchical Data , 2016, ACM Trans. Priv. Secur..

[48]  Yufei Tao,et al.  The hardness and approximation algorithms for l-diversity , 2009, EDBT '10.

[49]  Hae Okimoto,et al.  The PAR Framework Proof of Concept: Initial Findings from a Multi-Institutional Analysis of Federated Postsecondary Data , 2012 .

[50]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[51]  Ling Zhu Privacy in Context: Technology, Policy, and the Integrity of Social Life , 2011 .

[52]  Seeta Peña Gangadharan,et al.  Digital inclusion and data profiling , 2012, First Monday.

[53]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[54]  Y. Lou,et al.  Within-Class Grouping: A Meta-Analysis , 1996 .