Privacy-Preserving Design of Data Processing Systems in the Public Transport Context

The public transport network of a region inhabited by more than 4 million people is run by a complex interplay of public and private actors. Large amounts of data are generated by travellers, buying and using various forms of tickets and passes. Analysing the data is of paramount importance for the governance and sustainability of the system. This manuscript reports the early results of the privacy analysis which is being undertaken as part of the analysis of the clearing process in the Emilia-Romagna region, in Italy, which will compute the compensations for tickets bought from one operator and used with another. In the manuscript it is shown by means of examples that the clearing data may be used to violate various privacy aspects regarding users, as well as (technically equivalent) trade secrets regarding operators. The ensuing discussion has a twofold goal. First, it shows that after researching possible existing solutions, both by reviewing the literature on general privacy-preserving techniques, and by analysing similar scenarios that are being discussed in various cities across the world, the former are found exhibiting structural effectiveness deficiencies, while the latter are found of limited applicability, typically involving less demanding requirements. Second, it traces a research path towards a more effective approach to privacy-preserving data management in the specific context of public transport, both by refinement of current sanitization techniques and by application of the privacy by design approach.

[1]  Juan Pablo Hourcade,et al.  Electronic privacy and surveillance , 2014, CHI Extended Abstracts.

[2]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[3]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[4]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[5]  Jeffrey F. Naughton,et al.  Anonymization of Set-Valued Data via Top-Down, Local Generalization , 2009, Proc. VLDB Endow..

[6]  Benjamin C. M. Fung,et al.  Anonymizing trajectory data for passenger flow analysis , 2014 .

[7]  Ian Brown Britain's smart meter programme: A case study in privacy by design , 2014 .

[8]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[9]  Deborah A. Frincke,et al.  Relationships and data sanitization: a study in scarlet , 2010, NSPW '10.

[10]  Panos Kalnis,et al.  On the Anonymization of Sparse High-Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[12]  Carmela Troncoso,et al.  You cannot hide for long: de-anonymization of real-world dynamic behaviour , 2013, WPES.

[13]  Ronald Leenes,et al.  Privacy regulation cannot be hardcoded. A critical comment on the ‘privacy by design’ provision in data-protection law , 2014 .

[14]  Vitaly Shmatikov,et al.  The cost of privacy: destruction of data-mining utility in anonymized data publishing , 2008, KDD.

[15]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[16]  Raj Acharya,et al.  On breaching enterprise data privacy through adversarial information fusion , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[17]  Sarah Spiekermann,et al.  The challenges of privacy by design , 2012, Commun. ACM.

[18]  Sushil Jajodia,et al.  Secure Data Management in Decentralized Systems , 2014, Secure Data Management in Decentralized Systems.

[19]  Karl N. Levitt,et al.  Sanitization models and their limitations , 2006, NSPW '06.

[20]  Benjamin C. M. Fung,et al.  Publishing set-valued data via differential privacy , 2011, Proc. VLDB Endow..

[21]  Alessandro Acquisti,et al.  Information revelation and privacy in online social networks , 2005, WPES '05.

[22]  Peter Schaar,et al.  Privacy by Design , 2010 .

[23]  Wendy E. Mackay,et al.  Triggers and barriers to customizing software , 1991, CHI.

[24]  Ian Brown,et al.  Ethical Privacy Guidelines for Mobile Connectivity Measurements , 2013 .

[25]  Pawel Rotter,et al.  A Framework for Assessing RFID System Security and Privacy Risks , 2008, IEEE Pervasive Computing.

[26]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[27]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[28]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[29]  Jian Pei,et al.  Publishing Sensitive Transactions for Itemset Utility , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[30]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[31]  Jaap-Henk Hoepman,et al.  PDF hosted at the Radboud Repository of the Radboud University Nijmegen , 2022 .

[32]  Ting Yu,et al.  Empirical privacy and empirical utility of anonymized data , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[33]  Claudio Calvino Stalking pincopallino: sorveglianza, privacy e prossimità al tempo di Twitter , 2015 .

[34]  Paul Ohm Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization , 2009 .

[35]  Pingshui Wang,et al.  L-Diversity Algorithm for Incremental Data Release , 2013 .

[36]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[37]  Carmela Troncoso,et al.  Engineering Privacy by Design , 2011 .

[38]  Philippe Golle,et al.  On the Anonymity of Home/Work Location Pairs , 2009, Pervasive.

[39]  Ashish Sureka,et al.  WhACKY! - What anyone could know about you from Twitter , 2012, 2012 Tenth Annual International Conference on Privacy, Security and Trust.