Design of a Privacy-Preserving Data Platform for Collaboration Against Human Trafficking

Case records on victims of human trafficking are highly sensitive, yet the ability to share such data is critical to evidence-based practice and policy development across government, business, and civil society. We present new methods to anonymize, publish, and explore such data, implemented as a pipeline generating three artifacts: (1) synthetic data mitigating the privacy risk that published attribute combinations might be linked to known individuals or groups; (2) aggregate data mitigating the utility risk that synthetic data might misrepresent statistics needed for official reporting; and (3) visual analytics interfaces to both datasets mitigating the accessibility risk that privacy mechanisms or analysis tools might not be understandable and usable by all stakeholders. We present our work as a design study motivated by the goal of transforming how the world's largest database of identified victims is made available for global collaboration against human trafficking.

[1]  Carl A. Gunter,et al.  Plausible Deniability for Privacy-Preserving Data Synthesis , 2017, Proc. VLDB Endow..

[2]  Jun Zhang,et al.  PrivBayes: private data release via bayesian networks , 2014, SIGMOD Conference.

[3]  Michael Veale,et al.  Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making , 2018, CHI.

[4]  Yu Zhang,et al.  Differentially Private High-Dimensional Data Publication via Sampling-Based Inference , 2015, KDD.

[5]  Y. Engeström,et al.  Learning by expanding: An activity-theoretical approach to developmental research , 2014 .

[6]  Anderson Santana de Oliveira,et al.  Differentially Private Generative Adversarial Networks for Time Series, Continuous, and Discrete Open Data , 2019, SEC.

[7]  Jan Westerholm,et al.  Methods for deriving and calibrating privacy-preserving heat maps from mobile sports tracking application data , 2015 .

[8]  Anne-Sophie Charest,et al.  How Can We Analyze Differentially-Private Synthetic Datasets? , 2011, J. Priv. Confidentiality.

[9]  Alex Endert,et al.  Understanding Law Enforcement Strategies and Needs for Combating Human Trafficking , 2019, CHI.

[10]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[11]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[13]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[14]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[15]  Hannah Thinyane,et al.  Apprise: Supporting the Critical-Agency of Victims of Human Trafficking in Thailand , 2019, CHI.

[16]  Latanya Sweeney,et al.  Guaranteeing anonymity when sharing medical data, the Datafly System , 1997, AMIA.

[17]  Min Chen,et al.  Measuring Privacy and Utility in Privacy‐Preserving Visualization , 2013, Comput. Graph. Forum.

[18]  Tara Matthews,et al.  Stories from Survivors: Privacy & Security Practices when Coping with Intimate Partner Abuse , 2017, CHI.

[19]  Yang Wang,et al.  Privacy Preserving Visualization: A Study on Event Sequence Data , 2018, Comput. Graph. Forum.

[20]  Gillian M. Raab,et al.  Practical Data Synthesis for Large Samples , 2018, J. Priv. Confidentiality.

[21]  Robert Kosara,et al.  Guess Me If You Can: A Visual Uncertainty Model for Transparent Evaluation of Disclosure Risks in Privacy-Preserving Data Visualization , 2019, 2019 IEEE Symposium on Visualization for Cyber Security (VizSec).

[22]  Kwan-Liu Ma,et al.  A Utility-Aware Visual Approach for Anonymizing Multi-Attribute Tabular Data , 2018, IEEE Transactions on Visualization and Computer Graphics.

[23]  Mayank Kejriwal,et al.  Technology-assisted Investigative Search: A Case Study from an Illicit Domain , 2018, CHI Extended Abstracts.

[24]  Joshua Snoke,et al.  pMSE Mechanism: Differentially Private Synthetic Data with Maximal Distributional Similarity , 2018, PSD.

[25]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[26]  Bill Howe,et al.  Synthetic Data for Social Good , 2017, ArXiv.

[27]  David Grayson,et al.  Business Social Responsibility , 2017, Encyclopedia of Sustainability in Higher Education.

[28]  Kristiina Kangaspunta,et al.  Global Report on Trafficking in Persons 2018 , 2019, Global Report on Trafficking in Persons.

[29]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[30]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[31]  Dawn Xiaodong Song,et al.  Towards Practical Differential Privacy for SQL Queries , 2017, Proc. VLDB Endow..

[32]  G. King,et al.  PSI ( Ψ ) : a Private data Sharing Interface ∗ ( working paper ) , 2018 .

[33]  Peter Pirolli,et al.  Information Foraging , 2009, Encyclopedia of Database Systems.

[34]  Ju Ren,et al.  DPPro: Differentially Private High-Dimensional Data Release via Random Projection , 2017, IEEE Transactions on Information Forensics and Security.

[35]  Tamara Munzner,et al.  Design Study Methodology: Reflections from the Trenches and the Stacks , 2012, IEEE Transactions on Visualization and Computer Graphics.

[36]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[37]  L. Cox Statistical Disclosure Limitation , 2006 .

[38]  Xiaoqian Jiang,et al.  DPSynthesizer: Differentially Private Data Synthesizer for Privacy Preserving Data Sharing , 2014, Proc. VLDB Endow..

[39]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[40]  Marco Gaboardi,et al.  PSI (Ψ): a Private data Sharing Interface , 2016, ArXiv.

[41]  Kwan-Liu Ma,et al.  GraphProtector: A Visual Interface for Employing and Assessing Multiple Privacy Preserving Graph Algorithms , 2019, IEEE Transactions on Visualization and Computer Graphics.

[42]  Shuigeng Zhou,et al.  A novel privacy preserving method for data publication , 2019, Inf. Sci..

[43]  Bhavani M. Thuraisingham,et al.  Privacy Preserving Synthetic Data Release Using Deep Learning , 2018, ECML/PKDD.

[44]  Abhradeep Thakurta,et al.  Statistically Valid Inferences from Privacy-Protected Data , 2023, American Political Science Review.

[45]  Rob Comber,et al.  Technologies and Social Justice Outcomes in Sex Work Charities: Fighting Stigma, Saving Lives , 2017, CHI.

[46]  Bill Howe,et al.  DataSynthesizer: Privacy-Preserving Synthetic Datasets , 2017, SSDBM.

[47]  Harrison Quick,et al.  Generating Poisson‐distributed differentially private synthetic data , 2019, Journal of the Royal Statistical Society: Series A (Statistics in Society).

[48]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[49]  Natalie Shlomo,et al.  Privacy Protection from Sampling and Perturbation in Survey Microdata , 2012, J. Priv. Confidentiality.

[50]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[51]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[52]  Darren Edge,et al.  Beyond Tasks: An Activity Typology for Visual Analytics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[53]  Latanya Sweeney,et al.  Computational disclosure control: a primer on data privacy protection , 2001 .

[54]  Robert Kosara,et al.  Adaptive Privacy-Preserving Visualization Using Parallel Coordinates , 2011, IEEE Transactions on Visualization and Computer Graphics.

[55]  A. Friedrich,et al.  Trafficking in persons report , 2000 .

[56]  Ninghui Li,et al.  Optimizing Locally Differentially Private Protocols , 2017, ArXiv.

[57]  Fabian Prasser,et al.  Putting Statistical Disclosure Control into Practice: The ARX Data Anonymization Tool , 2015, Medical Data Privacy Handbook.

[58]  Ninghui Li,et al.  PriView: practical differentially private release of marginal contingency tables , 2014, SIGMOD Conference.

[59]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[60]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.