A Data Mining Approach to Assess Privacy Risk in Human Mobility Data

Human mobility data are an important proxy to understand human mobility dynamics, develop analytical services, and design mathematical models for simulation and what-if analysis. Unfortunately mobility data are very sensitive since they may enable the re-identification of individuals in a database. Existing frameworks for privacy risk assessment provide data providers with tools to control and mitigate privacy risks, but they suffer two main shortcomings: (i) they have a high computational complexity; (ii) the privacy risk must be recomputed every time new data records become available and for every selection of individuals, geographic areas, or time windows. In this article, we propose a fast and flexible approach to estimate privacy risk in human mobility data. The idea is to train classifiers to capture the relation between individual mobility patterns and the level of privacy risk of individuals. We show the effectiveness of our approach by an extensive experiment on real-world GPS data in two urban areas and investigate the relations between human mobility patterns and the privacy risk of individuals.

[1]  Wouter Joosen,et al.  A privacy threat analysis framework: supporting the elicitation and fulfillment of privacy requirements , 2011, Requirements Engineering.

[2]  Laks V. S. Lakshmanan,et al.  Anonymizing moving objects: how to hide a MOB in a crowd? , 2009, EDBT '09.

[3]  Sébastien Gambs,et al.  De-anonymization Attack on Geolocated Data , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[4]  Yu Zheng,et al.  Trajectory Data Mining , 2015, ACM Trans. Intell. Syst. Technol..

[5]  Stéphane Bressan,et al.  Not So Unique in the Crowd: a Simple and Effective Algorithm for Anonymizing Location Data , 2014, PIR@SIGIR.

[6]  Ira S. Rubinstein,et al.  Big Data: The End of Privacy or a New Beginning? , 2013 .

[7]  Alexandre M. Bayen,et al.  Understanding Road Usage Patterns in Urban Areas , 2012, Scientific Reports.

[8]  Zbigniew Smoreda,et al.  An analytical framework to nowcast well-being using mobile phone data , 2016, International Journal of Data Science and Analytics.

[9]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  Jayakrishnan Unnikrishnan,et al.  De-anonymizing private data by matching statistics , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[13]  Alessandro Vespignani,et al.  Modeling the Worldwide Spread of Pandemic Influenza: Baseline Case and Containment Interventions , 2007, PLoS medicine.

[14]  Siddharth Gupta,et al.  The TimeGeo modeling framework for urban mobility without travel surveys , 2016, Proceedings of the National Academy of Sciences.

[15]  Zbigniew Smoreda,et al.  On the Use of Human Mobility Proxies for Modeling Epidemics , 2013, PLoS Comput. Biol..

[16]  Benjamin C. M. Fung,et al.  Walking in the crowd: anonymizing trajectory data for pattern analysis , 2009, CIKM.

[17]  Wendy Hui Wang,et al.  Privacy-Preserving Distributed Movement Data Aggregation , 2013, AGILE Conf..

[18]  Alex Pentland,et al.  On the Trusted Use of Large-Scale Personal Data , 2012, IEEE Data Eng. Bull..

[19]  Chaoming Song,et al.  Modelling the scaling properties of human mobility , 2010, 1010.0436.

[20]  Licia Capra,et al.  Urban Computing: Concepts, Methodologies, and Applications , 2014, TIST.

[21]  Christopher J. Alberts,et al.  Operationally Critical Threat, Asset, and Vulnerability Evaluation (OCTAVE) Framework, Version 1.0 , 1999 .

[22]  Jian Pei,et al.  Publishing Sensitive Transactions for Itemset Utility , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[24]  Frank Swiderski,et al.  Threat Modeling , 2018, Hacking Connected Cars.

[25]  Dino Pedreschi,et al.  Returners and explorers dichotomy in human mobility , 2015, Nature Communications.

[26]  Ruggero G. Pensa,et al.  Anonymity preserving sequential pattern mining , 2014, Artificial Intelligence and Law.

[27]  Dino Pedreschi,et al.  Small Area Model-Based Estimators Using Big Data Sources , 2015 .

[28]  Augustin Chaintreau,et al.  "I knew they clicked when i saw them with their friends": identifying your silent web visitors on social media , 2014, COSN '14.

[29]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[30]  Zbigniew Smoreda,et al.  Using big data to study the link between human mobility and socio-economic development , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[31]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[32]  Yu Zheng,et al.  Computing with Spatial Trajectories , 2011, Computing with Spatial Trajectories.

[33]  A. Pentland,et al.  Eigenbehaviors: identifying structure in routine , 2009, Behavioral Ecology and Sociobiology.

[34]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[35]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[36]  Marc-Olivier Killijian,et al.  Next place prediction using mobility Markov chains , 2012, MPM '12.

[37]  G. Stoneburner,et al.  Risk Management Guide for Information Technology Systems: Recommendations of the National Institute of Standards and Technology , 2002 .

[38]  TerrovitisManolis,et al.  Privacy-preserving anonymization of set-valued data , 2008, VLDB 2008.

[39]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[40]  Anna Monreale,et al.  A Privacy Risk Model for Trajectory Data , 2014, IFIPTM.

[41]  Francesca Pratesi,et al.  Privacy-by-design in big data analytics and social mining , 2014, EPJ Data Science.

[42]  Nikos Mamoulis,et al.  Security in Outsourcing of Association Rule Mining , 2007, VLDB.

[43]  Nikos Mamoulis,et al.  Privacy Preservation in the Publication of Trajectories , 2008, The Ninth International Conference on Mobile Data Management (mdm 2008).

[44]  Dino Pedreschi,et al.  Understanding the patterns of car travel , 2013 .

[45]  Alessandro Armando,et al.  Risk-Based Privacy-Aware Information Disclosure , 2015, Int. J. Secur. Softw. Eng..

[46]  Shouling Ji,et al.  Structure Based Data De-Anonymization of Social Networks and Mobility Traces , 2014, ISC.

[47]  Xin Lu,et al.  Approaching the Limit of Predictability in Human Mobility , 2013, Scientific Reports.

[48]  Timothy A. Thomas,et al.  Measures of Human Mobility Using Mobile Phone Records Enhanced with GIS Data , 2014, PloS one.

[49]  Marta C. González,et al.  A universal model for mobility and migration patterns , 2011, Nature.

[50]  Hui Zang,et al.  Anonymization of location data does not work: a large-scale measurement study , 2011, MobiCom.

[51]  Divesh Srivastava,et al.  Differentially private summaries for sparse data , 2012, ICDT '12.

[52]  Tim Schmitz,et al.  Improving Web Application Security Threats And Countermeasures , 2016 .

[53]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[54]  Philip S. Yu,et al.  Anonymizing transaction databases for publication , 2008, KDD.

[55]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[56]  Francesco Bonchi,et al.  Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[57]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[58]  Anna Monreale,et al.  Mobility Data and Privacy , 2013, Mobility Data.

[59]  Luca Pappalardo,et al.  Human Mobility Modelling: Exploration and Preferential Return Meet the Gravity Model , 2016, ANT/SEIT.

[60]  Franco Zambonelli,et al.  Re-identification and information fusion between anonymized CDR and social network data , 2015, Journal of Ambient Intelligence and Humanized Computing.

[61]  Claude Castelluccia,et al.  On the Unicity of Smartphone Applications , 2015, WPES@CCS.

[62]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[63]  Slim Trabelsi,et al.  Data disclosure risk evaluation , 2009, 2009 Fourth International Conference on Risks and Security of Internet and Systems (CRiSIS 2009).

[64]  Dino Pedreschi,et al.  Unveiling the complexity of human mobility by querying and mining massive trajectory data , 2011, The VLDB Journal.

[65]  R. Gallotti,et al.  Statistical laws in urban mobility from microscopic GPS data in the area of Florence , 2009, 0912.4371.

[66]  Anna Monreale,et al.  Movement data anonymity through generalization , 2009, SPRINGL '09.