Your Apps Give You Away

Understanding mobile app usage has become instrumental to service providers to optimize their online services. Meanwhile, there is a growing privacy concern that users' app usage may uniquely reveal who they are. In this paper, we seek to understand how likely a user can be uniquely re-identified in the crowd by the apps she uses. We systematically quantify the uniqueness of app usage via large-scale empirical measurements. By collaborating with a major cellular network provider, we obtained a city-scale anonymized dataset on mobile app traffic (1.37 million users, 2000 apps, 9.4 billion network connection records). Through extensive analysis, we show that the set of apps that a user has installed is already highly unique. For users with more than 10 apps, 88% of them can be uniquely re-identified by 4 random apps. The uniqueness level is even higher if we consider when and where the apps are used. We also observe that user attributes (e.g., gender, social activity, and mobility patterns) all have an impact on the uniqueness of app usage. Our work takes the first step towards understanding the unique app usage patterns for a large user population, paving the way for further research to develop privacy-protection techniques and building personalized online services.

[1]  Deborah Estrin,et al.  Diversity in smartphone usage , 2010, MobiSys '10.

[2]  M. Huggins,et al.  Principles of Polymer Chemistry. , 1954 .

[3]  Henry A. Kautz,et al.  Finding your friends and following them to where you are , 2012, WSDM '12.

[4]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Josep Domingo-Ferrer,et al.  Microaggregation- and permutation-based anonymization of movement data , 2012, Inf. Sci..

[6]  Martín Abadi,et al.  Host Fingerprinting and Tracking on the Web: Privacy and Security Implications , 2012, NDSS.

[7]  Francesco Bonchi,et al.  Anonymization of moving objects databases by clustering and perturbation , 2010, Inf. Syst..

[8]  Denzil Ferreira,et al.  Understanding the Challenges of Mobile Phone Usage Data , 2015, MobileHCI.

[9]  Saeed Moghaddam,et al.  MobileMiner: mining your frequent patterns on your phone , 2014, UbiComp.

[10]  Stéphane Bressan,et al.  Not So Unique in the Crowd: a Simple and Effective Algorithm for Anonymizing Location Data , 2014, PIR@SIGIR.

[11]  Alexander Markowetz,et al.  Differentiating smartphone users by app usage , 2016, UbiComp.

[12]  Prasant Mohapatra,et al.  Your Installed Apps Reveal Your Gender and More! , 2015, MOCO.

[13]  Thomas Steinke,et al.  Robust Traceability from Trace Amounts , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[14]  Wen-Chih Peng,et al.  On mining mobile apps usage behavior for predicting apps usage in smartphones , 2013, CIKM.

[15]  Xiaoxiao Ma,et al.  Predicting mobile application usage using contextual information , 2012, UbiComp.

[16]  Qiang Xu,et al.  Identifying diverse usage behaviors of smartphone apps , 2011, IMC '11.

[17]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18]  Philip S. Yu,et al.  Differentially Private Data Publishing and Analysis: A Survey , 2017, IEEE Transactions on Knowledge and Data Engineering.

[19]  Claude Castelluccia,et al.  Study : Privacy Preserving Release of Spatio-temporal Density in Paris , 2014 .

[20]  Dan Pei,et al.  Your trajectory privacy can be breached even if you walk in groups , 2016, 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS).

[21]  Yin Yang,et al.  Differential privacy in data publication and analysis , 2012, SIGMOD Conference.

[22]  Ingmar Weber,et al.  You Are What Apps You Use: Demographic Prediction Based on User's Apps , 2016, ICWSM.

[23]  Marco Fiore,et al.  Hiding mobile traffic fingerprints with GLOVE , 2015, CoNEXT.

[24]  Dino Pedreschi,et al.  Human mobility, social ties, and link prediction , 2011, KDD.

[25]  Sung-Bae Cho,et al.  Location-Based Recommendation System Using Bayesian User's Preference Model in Mobile Devices , 2007, UIC.

[26]  Arvind Narayanan,et al.  De-anonymizing Web Browsing Data with Social Networks , 2017, WWW.

[27]  Yong Liao,et al.  SAMPLES: Self Adaptive Mining of Persistent LExical Snippets for Classifying Mobile Application Traffic , 2015, MobiCom.

[28]  Gabi Nakibly,et al.  Mobile Device Identification via Sensor Fingerprinting , 2014, ArXiv.

[29]  Prasant Mohapatra,et al.  Predicting user traits from a snapshot of apps installed on a smartphone , 2014, MOCO.

[30]  Filip De Turck,et al.  Mobile application usage prediction through context-based learning , 2013, J. Ambient Intell. Smart Environ..

[31]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[32]  Peter Eckersley,et al.  How Unique Is Your Web Browser? , 2010, Privacy Enhancing Technologies.

[33]  Zhaohui Wu,et al.  Discovering different kinds of smartphone users through their application usage behaviors , 2016, UbiComp.

[34]  Josep Domingo-Ferrer,et al.  From t-Closeness-Like Privacy to Postrandomization via Information Theory , 2010, IEEE Transactions on Knowledge and Data Engineering.

[35]  Alex Pentland,et al.  Predicting Personality Using Novel Mobile Phone-Based Metrics , 2013, SBP.

[36]  Y. de Montjoye,et al.  Unique in the shopping mall: On the reidentifiability of credit card metadata , 2015, Science.

[37]  Anind K. Dey,et al.  ProactiveTasks: the short of mobile device use sessions , 2014, MobileHCI '14.

[38]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[39]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[40]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[41]  Anna Monreale,et al.  Movement data anonymity through generalization , 2009, SPRINGL '09.

[42]  A. Ravve,et al.  Principles of Polymer Chemistry , 1995 .

[43]  Gary M. Weiss,et al.  Identifying user traits by mining smart phone accelerometer data , 2011, SensorKDD '11.

[44]  Daniel Gatica-Perez,et al.  Smartphone usage in the wild: a large-scale analysis of applications and context , 2011, ICMI '11.

[45]  Jorge Gonçalves,et al.  Revisitation analysis of smartphone app use , 2015, UbiComp.

[46]  Jorge Gonçalves,et al.  Contextual experience sampling of mobile application micro-usage , 2014, MobileHCI '14.

[47]  Jin-Hyuk Hong,et al.  Understanding and prediction of mobile application usage for smart phones , 2012, UbiComp.

[48]  Stan Z. Li,et al.  Hamming Distance , 2009, Encyclopedia of Biometrics.

[49]  Hui Zang,et al.  Anonymization of location data does not work: a large-scale measurement study , 2011, MobiCom.

[50]  Simson Garfinkel,et al.  Privacy Protection and RFID , 2006 .

[51]  Kaigui Bian,et al.  Characterizing Smartphone Usage Patterns from Millions of Android Users , 2015, Internet Measurement Conference.

[52]  Mikko Alava,et al.  Patterns, Entropy, and Predictability of Human Mobility and Life , 2012, PloS one.

[53]  Ling Liu,et al.  Supporting anonymous location queries in mobile environments with privacygrid , 2008, WWW.

[54]  Marco Gruteser,et al.  USENIX Association , 1992 .