Superstring-Based Sequence Obfuscation to Thwart Pattern Matching Attacks

User privacy can be compromised by matching user data traces to records of their previous behavior. The matching of the statistical characteristics of traces to prior user behavior has been widely studied. However, an adversary can also identify a user deterministically by searching data traces for a pattern that is unique to that user. Our goal is to thwart such an adversary by applying small artificial distortions to data traces such that each potentially identifying pattern is shared by a large number of users. Importantly, in contrast to statistical approaches, we develop data-independent algorithms that require no assumptions on the model by which the traces are generated. By relating the problem to a set of combinatorial questions on sequence construction, we are able to provide provable guarantees for our proposed constructions. We also introduce data-dependent approaches for the same problem. The algorithms are evaluated on synthetic data traces and on the Reality Mining Dataset to demonstrate their utility.

[1]  Joydeep Biswas,et al.  Server-Side Traffic Analysis Reveals Mobile Location Information over the Internet , 2019, IEEE Transactions on Mobile Computing.

[2]  Sofoklis Kyriazakos,et al.  Applying pattern recognition techniques based on hidden Markov models for vehicular position location in cellular networks , 1999, Gateway to 21st Century Communications Village. VTC 1999-Fall. IEEE VTS 50th Vehicular Technology Conference (Cat. No.99CH36324).

[3]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[4]  Elza Erkip,et al.  A Concentration of Measure Approach to Database De-anonymization , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[5]  Olivier Festor,et al.  Passive Inference of User Actions through IoT Gateway Encrypted Traffic Analysis , 2019, 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[6]  Hao Wang,et al.  An estimation-theoretic view of privacy , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  Richard J. Povinelli,et al.  A New Temporal Pattern Identification Method for Characterization and Prediction of Complex Time Series Events , 2003, IEEE Trans. Knowl. Data Eng..

[8]  Elza Erkip,et al.  Optimal Active social Network De-anonymization Using Information Thresholds , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[9]  Soma Bandyopadhyay,et al.  IoT-Privacy: To be private or not to be private , 2014, 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[10]  Costas S. Iliopoulos,et al.  Pattern Matching on Weighted Sequences , 2004 .

[11]  Arnold L. Rosenberg,et al.  Rapid identification of repeated patterns in strings, trees and arrays , 1972, STOC.

[12]  Jian Pei,et al.  A brief survey on anonymization techniques for privacy preserving publishing of social network data , 2008, SKDD.

[13]  Sabrina De Capitani di Vimercati,et al.  An Obfuscation-Based Approach for Protecting Location Privacy , 2011, IEEE Transactions on Dependable and Secure Computing.

[14]  Riccardo Bettati,et al.  On Flow Correlation Attacks and Countermeasures in Mix Networks , 2004, Privacy Enhancing Technologies.

[15]  Hossein Pishro-Nik,et al.  Matching Anonymized and Obfuscated Time Series to Users’ Profiles , 2017, IEEE Transactions on Information Theory.

[16]  Takeshi Koshiba,et al.  Secure pattern matching using somewhat homomorphic encryption , 2013, CCSW.

[17]  Dennis Goeckel,et al.  Privacy of Dependent Users Against Statistical Matching , 2018, IEEE Transactions on Information Theory.

[18]  Ramón Cáceres,et al.  Route classification using cellular handoff patterns , 2011, UbiComp '11.

[19]  de Ng Dick Bruijn A combinatorial problem , 1946 .

[20]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[21]  Fred S. Annexstein Generating De Bruijn Sequences: An Efficient Implementation , 1997, IEEE Trans. Computers.

[22]  Yiwei Thomas Hou,et al.  Privacy-preserving pattern matching over encrypted genetic data in cloud computing , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[23]  Jayakrishnan Unnikrishnan,et al.  De-anonymizing private data by matching statistics , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[24]  Dennis Goeckel,et al.  Privacy Against Statistical Matching: Inter-User Correlation , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[25]  Rafail Ostrovsky,et al.  5PM: Secure pattern matching , 2012, J. Comput. Secur..

[26]  Nick Feamster,et al.  User Perceptions of Smart Home IoT Privacy , 2018, Proc. ACM Hum. Comput. Interact..

[27]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[28]  Martin Vetterli,et al.  Where You Are Is Who You Are: User Identification by Matching Statistics , 2015, IEEE Transactions on Information Forensics and Security.

[29]  John A. Quinn,et al.  Methodologies for Continuous Cellular Tower Data Analysis , 2009, Pervasive.

[30]  Iman Izadi,et al.  Pattern matching of alarm flood sequences by a modified Smith–Waterman algorithm , 2013 .

[31]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[32]  Jian Pei,et al.  Preserving Privacy in Social Networks Against Neighborhood Attacks , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[33]  Hossein Pishro-Nik,et al.  Sequence Obfuscation to Thwart Pattern Matching Attacks , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[34]  Robin Houston Tackling the Minimal Superpermutation Problem , 2014, ArXiv.

[35]  Marco Gruteser,et al.  Protecting Location Privacy Through Path Confusion , 2005, First International Conference on Security and Privacy for Emerging Areas in Communications Networks (SECURECOMM'05).

[36]  Sasa Radomirovic A Construction of Short Sequences Containing All Permutations of a Set as Subsequences , 2012, Electron. J. Comb..

[37]  Jayakrishnan Unnikrishnan,et al.  Asymptotically Optimal Matching of Multiple Sequences to Source Distributions and Training Sequences , 2014, IEEE Transactions on Information Theory.

[38]  Hossein Pishro-Nik,et al.  Achieving Perfect Location Privacy in Wireless Devices Using Anonymization , 2016, IEEE Transactions on Information Forensics and Security.

[39]  Ernesto Damiani,et al.  Location Privacy Protection Through Obfuscation-Based Techniques , 2007, DBSec.

[40]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[41]  Prateek Mittal,et al.  Fundamental Limits of Database Alignment , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[42]  Hao Wang,et al.  On the Robustness of Information-Theoretic Privacy Measures and Mechanisms , 2018, IEEE Transactions on Information Theory.

[43]  Xiang Pan,et al.  FlowCog: Context-Aware Semantic Extraction and Analysis of Information Flow Leaks in Android Apps , 2023, IEEE Transactions on Mobile Computing.

[44]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[45]  Christopher Krügel,et al.  A Practical Attack to De-anonymize Social Network Users , 2010, 2010 IEEE Symposium on Security and Privacy.

[46]  David Lazer,et al.  Inferring friendship network structure by using mobile phone data , 2009, Proceedings of the National Academy of Sciences.

[47]  Lixin Gao,et al.  Profiling users in a 3g network using hourglass co-clustering , 2010, MobiCom.

[48]  Marco Gruteser,et al.  USENIX Association , 1992 .

[49]  R. Walgate Tale of two cities , 1984, Nature.

[50]  Qiang Xu,et al.  AccuLoc: practical localization of performance measurements in 3G networks , 2011, MobiSys '11.

[51]  Carmela Troncoso,et al.  Vida: How to Use Bayesian Inference to De-anonymize Persistent Communications , 2009, Privacy Enhancing Technologies.

[52]  Dennis Goeckel,et al.  Asymptotic Loss in Privacy due to Dependency in Gaussian Traces , 2018, 2019 IEEE Wireless Communications and Networking Conference (WCNC).

[53]  Neil W. Bergmann,et al.  IoT Privacy and Security Challenges for Smart Home Environments , 2016, Inf..

[54]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[55]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[56]  Nick Feamster,et al.  Spying on the Smart Home: Privacy Attacks and Defenses on Encrypted IoT Traffic , 2017, ArXiv.

[57]  Dennis Goeckel,et al.  Limits of location privacy under anonymization and obfuscation , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[58]  Xiang Pan I Do Not Know What You Visited Last Summer : Protecting Users from Third-party Web Tracking with TrackingFree Browser , 2015 .

[59]  Ahmad-Reza Sadeghi,et al.  Security and privacy challenges in industrial Internet of Things , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[60]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[61]  M. Newey Notes on a problem involving permutations as subsequences. , 1973 .

[62]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[63]  Man-Hong Yung,et al.  Optimal Mechanism for Randomized Responses under Universally Composable Security Measure , 2018, 2019 IEEE International Symposium on Information Theory (ISIT).

[64]  Opim Salim Sitompul,et al.  File Type Identification of File Fragments using Longest Common Subsequence (LCS) , 2017 .