Select-Organize-Anonymize: A Framework for Trajectory Data Anonymization

Advances in positioning technologies together with the wide adoption of GPS-enabled smartphones enable accurate and low-cost tracking of user location. This allows the collection of large amounts of person-specific mobility data that offer remarkable opportunities for data analysis. Yet, the sharing of such data poses significant privacy risks. This enunciates the need for privacy-preserving, trajectory data publishing methods. Existing approaches are either limited in their privacy specification component or they incur significant, and often unnecessary, data distortion. In response, we propose a novel framework for anonymizing trajectory data that prevents the disclosure of both identity and sensitive location information, while retaining data utility. Our framework involves: (i) selecting similar trajectories, by employing Z-ordering or data projections on frequent sub trajectories, (ii) organizing the selected trajectories into carefully constructed clusters, and (ii) anonymizing each cluster separately. We develop algorithms to realize our framework, which are effective and efficient, as verified by extensive experiments.

[1]  Kamalakar Karlapalem,et al.  MARGIN: Maximal Frequent Subgraph Mining , 2006, ICDM.

[2]  Kun-Lung Wu,et al.  Towards proximity pattern mining in large graphs , 2010, SIGMOD Conference.

[3]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[4]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[5]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[6]  Valerie Guralnik,et al.  A scalable algorithm for clustering sequential data , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Panos Kalnis,et al.  Local and global recoding methods for anonymizing set-valued data , 2010, The VLDB Journal.

[8]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[9]  Nikos Mamoulis,et al.  Privacy Preservation in the Publication of Trajectories , 2008, The Ninth International Conference on Mobile Data Management (mdm 2008).

[10]  Spiros Skiadopoulos,et al.  Distance-Based k^m-Anonymization of Trajectory Data , 2013, 2013 IEEE 14th International Conference on Mobile Data Management.

[11]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[12]  G. Karypis,et al.  Frequent sub-structure-based approaches for classifying chemical compounds , 2005, Third IEEE International Conference on Data Mining.

[13]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[14]  Alan K. Mackworth Consistency in Networks of Relations , 1977, Artif. Intell..

[15]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[16]  Mining Patterns in Structured Data , 2022 .

[17]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[18]  Chiara Renso,et al.  Finding moving flock patterns among pedestrians through collective coherence , 2011, Int. J. Geogr. Inf. Sci..

[19]  Krzysztof Janowicz,et al.  A Geo-ontology Design Pattern for Semantic Trajectories , 2013, COSIT.

[20]  Rajeev Raman,et al.  Converting to and from Dilated Integers , 2008, IEEE Transactions on Computers.

[21]  Bradley Malin,et al.  COAT: COnstraint-based anonymization of transactions , 2010, Knowledge and Information Systems.

[22]  Philip S. Yu,et al.  Fast Graph Pattern Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  William H. Press,et al.  Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .

[24]  Robert Weibel,et al.  Towards a taxonomy of movement patterns , 2008, Inf. Vis..

[25]  Laks V. S. Lakshmanan,et al.  Trajectory anonymity in publishing personal mobility data , 2011, SKDD.

[26]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[27]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[28]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[29]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[30]  Yücel Saygin,et al.  Towards trajectory anonymization: a generalization-based approach , 2008, SPRINGL '08.

[31]  Elisa Bertino,et al.  Preventing velocity-based linkage attacks in location-aware applications , 2009, GIS.

[32]  Laks V. S. Lakshmanan,et al.  Anonymizing moving objects: how to hide a MOB in a crowd? , 2009, EDBT '09.

[33]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[34]  Nikos Mamoulis,et al.  Spatio-textual similarity joins , 2012, Proc. VLDB Endow..

[35]  Aidong Zhang,et al.  Predicting Protein Function by Frequent Functional Association Pattern Mining in Protein Interaction Networks , 2010, IEEE Transactions on Information Technology in Biomedicine.

[36]  Roger Clarke,et al.  Person location and person tracking - Technologies, risks and policy implications , 2001, Inf. Technol. People.

[37]  Simon de Givry,et al.  Exploiting Tree Decomposition and Soft Local Consistency In Weighted CSP , 2006, AAAI.

[38]  Vania Bogorny,et al.  C-safety: a framework for the anonymization of semantic trajectories , 2011, Trans. Data Priv..

[39]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[40]  Nikos Pelekis,et al.  Baquara: A Holistic Ontological Framework for Movement Analysis Using Linked Data , 2013, ER.

[41]  Panos Kalnis,et al.  Anonymous Publication of Sensitive Transactional Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[42]  François Yvon,et al.  Robust Similarity Measures for Named Entities Matching , 2008, COLING.

[43]  Francesco Bonchi,et al.  Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[44]  Thomas Brinkhoff,et al.  A Framework for Generating Network-Based Moving Objects , 2002, GeoInformatica.

[45]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[46]  George Karypis,et al.  GREW - a scalable frequent subgraph discovery algorithm , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[47]  Jianliang Xu,et al.  Spatial-aware interest group queries in location-based social networks , 2012, Data Knowl. Eng..

[48]  Claude Castelluccia,et al.  Differentially private sequential data publication via variable-length n-grams , 2012, CCS.

[49]  Stefano Spaccapietra,et al.  Conceptual modeling for traditional and spatio-temporal applications - the MADS approach , 2006 .

[50]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[51]  Philip S. Yu,et al.  gPrune: A Constraint Pushing Framework for Graph Pattern Mining , 2007, PAKDD.

[52]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[53]  Wei-Ta Chu,et al.  Visual pattern discovery for architecture image classification and product image search , 2012, ICMR.

[54]  Zhaochen Guo,et al.  Entity linking with a unified semantic representation , 2014, WWW '14 Companion.

[55]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[56]  Jignesh M. Patel,et al.  An online framework for publishing privacy-sensitive location traces , 2010, MobiDE '10.

[57]  Jiawei Han,et al.  gApprox: Mining Frequent Approximate Patterns from a Massive Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[58]  Jeong-Hoon Lee,et al.  An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases , 2012, Proc. VLDB Endow..

[59]  Anna Monreale,et al.  Movement data anonymity through generalization , 2009, SPRINGL '09.

[60]  Sudarshan S. Chawathe,et al.  SEuS: Structure Extraction Using Summaries , 2002, Discovery Science.

[61]  Jian Su,et al.  Entity Linking Leveraging Automatically Generated Annotation , 2010, COLING.

[62]  Benjamin C. M. Fung,et al.  Differentially private transit data publication: a case study on the montreal transportation system , 2012, KDD.

[63]  Vania Bogorny,et al.  A clustering-based approach for discovering interesting places in trajectories , 2008, SAC '08.

[64]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[65]  Wei Shen,et al.  Linking named entities in Tweets with knowledge base via user interest modeling , 2013, KDD.

[66]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[67]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[68]  Grigorios Loukides,et al.  An Efficient Clustering Algorithm for k-Anonymisation , 2008, Journal of Computer Science and Technology.

[69]  Ambuj K. Singh,et al.  GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[70]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[71]  J. J. McGregor Relational consistency algorithms and their application in finding subgraph and graph isomorphisms , 1979, Inf. Sci..

[72]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[73]  Christian Borgelt,et al.  Subgraph Support in a Single Large Graph , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[74]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[75]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[76]  Vania Bogorny,et al.  An algorithm to identify avoidance behavior in moving object trajectories , 2011, Journal of the Brazilian Computer Society.

[77]  Guoliang Li,et al.  Star-Join: spatio-textual similarity join , 2012, CIKM '12.

[78]  Ronen I. Brafman,et al.  Preference-Based Configuration of Web Page Content , 2001, IJCAI.