Incognito: A Method for Obfuscating Web Data

Users leave a trail of their personal data, interests, and intents while surfing or sharing information on the Web. Web data could therefore reveal some private/sensitive information about users based on inference analysis. The possible identification of information corresponding to a single individual by an inference attack holds true even if the user identifiers are encoded or removed in the Web data. Several works have been done on improving privacy of Web data through obfuscation methods~\citeHow09,Dom09,Sha05,Che14. However, these methods are neither comprehensive, generic to be applicable to any Web data, nor effective against adversarial attacks. To this end, we propose a privacy-aware obfuscation method for Web data addressing these identified drawbacks of existing methods. We use probabilistic methods to predict privacy risk of Web data that incorporates all key privacy aspects, which are uniqueness, uniformity, and linkability of Web data. The Web data with high predicted risk are then obfuscated by our method to minimize the privacy risk using semantically similar data. Our method is resistant against adversary who has knowledge about the datasets and model learned risk probabilities using differential privacy-based noise addition. Experimental study conducted on two real Web datasets validates the significance and efficacy of our method. Our results indicate that the average privacy risk reaches to 100% with a minimum of 10 sensitive Web entries, while at most 0% privacy risk could be attained with our obfuscation method at the cost of average utility loss of 64.3%.

[1]  Gerhard Weikum,et al.  Probabilistic Prediction of Privacy Risks in User Search Histories , 2014, PSBD '14.

[2]  Tsvi Kuflik,et al.  PRAW—A PRivAcy model for the Web: Research Articles , 2005 .

[3]  Xiaochun Yang,et al.  Protecting Individual Information Against Inference Attacks in Data Publishing , 2007, DASFAA.

[4]  L Sweeney,et al.  Weaving Technology and Policy Together to Maintain Confidentiality , 1997, Journal of Law, Medicine & Ethics.

[5]  Srdjan Capkun,et al.  Quantifying Web-Search Privacy , 2014, CCS.

[6]  Nikita Borisov,et al.  Do You Hear What I Hear?: Fingerprinting Smart Devices Through Embedded Acoustic Components , 2014, CCS.

[7]  Nitesh Saxena,et al.  On the Privacy of Web Search Based on Query Obfuscation: A Case Study of TrackMeNot , 2010, Privacy Enhancing Technologies.

[8]  Xiangyu Liu,et al.  Acoustic Fingerprinting Revisited: Generate Stable Device ID Stealthily with Inaudible Sound , 2014, CCS.

[9]  Martín Abadi,et al.  Host Fingerprinting and Tracking on the Web: Privacy and Security Implications , 2012, NDSS.

[10]  Urs Hengartner,et al.  Privacy: Gone with the Typing! Identifying Web Users by Their Typing Patterns , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[11]  Carmela Troncoso,et al.  OB-PWS: Obfuscation-Based Private Web Search , 2012, 2012 IEEE Symposium on Security and Privacy.

[12]  Evimaria Terzi,et al.  A Framework for Computing the Privacy Scores of Users in Online Social Networks , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[13]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[14]  Krishna P. Gummadi,et al.  R-Susceptibility: An IR-Centric Approach to Assessing Privacy Risks for Users in Online Communities , 2016, SIGIR.

[15]  Tsvi Kuflik,et al.  PRAW - A PRivAcy model for the Web , 2005, J. Assoc. Inf. Sci. Technol..

[16]  Roksana Boreli,et al.  On the Effectiveness of Obfuscation Techniques in Online Social Networks , 2014, Privacy Enhancing Technologies.

[17]  Walter Rudametkin,et al.  Beauty and the Beast: Diverting Modern Web Browsers to Build Unique Browser Fingerprints , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[18]  Peter Eckersley,et al.  How Unique Is Your Web Browser? , 2010, Privacy Enhancing Technologies.

[19]  Felix C. Freiling,et al.  Fingerprinting Mobile Devices Using Personalized Configurations , 2016, Proc. Priv. Enhancing Technol..

[20]  Hassan Jameel Asghar,et al.  POSTER: TouchTrack: How Unique are your Touch Gestures? , 2017, CCS.

[21]  Horst Bunke,et al.  Hidden Markov models: applications in computer vision , 2001 .

[22]  Ian R. Kerr,et al.  Lessons from the Identity Trail: Anonymity, Privacy and Identity in a Networked Society , 2009 .

[23]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[24]  Nikita Borisov,et al.  Tracking Mobile Web Users Through Motion Sensors: Attacks and Defenses , 2016, NDSS.

[25]  Arvind Narayanan,et al.  De-anonymizing Web Browsing Data with Social Networks , 2017, WWW.

[26]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[27]  Helen Nissenbaum,et al.  Trackmenot: Resisting Surveillance in Web Search , 2015 .

[28]  Stratis Ioannidis,et al.  BlurMe: inferring and obfuscating user gender based on ratings , 2012, RecSys.

[29]  Philippe Golle,et al.  Faking contextual data for fun, profit, and privacy , 2009, WPES '09.

[30]  Søren Brunak,et al.  Hidden Markov Models: Applications , 2001 .

[31]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[32]  Josep Domingo-Ferrer,et al.  H(k)-private Information Retrieval from Privacy-uncooperative Queryable Databases.">h(k)-private Information Retrieval from Privacy-uncooperative Queryable Databases , 2009, Online Inf. Rev..

[33]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[34]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[35]  Claude Castelluccia,et al.  On the uniqueness of Web browsing history patterns , 2014, Ann. des Télécommunications.

[36]  Alistair A. Young,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2017, MICCAI 2017.

[37]  Muhammad Ikram,et al.  A first look at mobile Ad-Blocking apps , 2017, 2017 IEEE 16th International Symposium on Network Computing and Applications (NCA).

[38]  Wenyuan Xu,et al.  AccelPrint: Imperfections of Accelerometers Make Smartphones Trackable , 2014, NDSS.

[39]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[40]  Nina Taft,et al.  How to hide the elephant- or the donkey- in the room: Practical privacy against statistical inference for large data , 2013, 2013 IEEE Global Conference on Signal and Information Processing.