Information Security in Big Data: Privacy and Data Mining

The growing popularity and development of data mining technologies bring serious threat to the security of individual,'s sensitive information. An emerging research topic in data mining, known as privacy-preserving data mining (PPDM), has been extensively studied in recent years. The basic idea of PPDM is to modify the data in such a way so as to perform data mining algorithms effectively without compromising the security of sensitive information contained in the data. Current studies of PPDM mainly focus on how to reduce the privacy risk brought by data mining operations, while in fact, unwanted disclosure of sensitive information may also happen in the process of data collecting, data publishing, and information (i.e., the data mining results) delivering. In this paper, we view the privacy issues related to data mining from a wider perspective and investigate various approaches that can help to protect sensitive information. In particular, we identify four different types of users involved in data mining applications, namely, data provider, data collector, data miner, and decision maker. For each type of user, we discuss his privacy concerns and the methods that can be adopted to protect sensitive information. We briefly introduce the basics of related research topics, review state-of-the-art approaches, and present some preliminary thoughts on future research directions. Besides exploring the privacy-preserving approaches for each type of user, we also review the game theoretical approaches, which are proposed for analyzing the interactions among different users in a data mining scenario, each of whom has his own valuation on the sensitive information. By differentiating the responsibilities of different users with respect to security of sensitive information, we would like to provide some useful insights into the study of PPDM.

[1]  M. Naga Lakshmi,et al.  SVD based Data Transformation Methods for Privacy Preserving Clustering , 2013 .

[2]  Sara Foresti,et al.  Microdata Protection , 2007, Encyclopedia of Cryptography and Security.

[3]  Huan Liu,et al.  Provenance Data in Social Media , 2013, Synthesis Lectures on Data Mining and Knowledge Discovery.

[4]  Zhang Jun,et al.  The (P, α, K) anonymity model for privacy protection of personal information in the social networks , 2011, 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference.

[5]  Reihaneh Safavi-Naini,et al.  Privacy Consensus in Anonymization Systems via Game Theory , 2012, DBSec.

[6]  Taneli Mielikäinen,et al.  On Inverse Frequent Set Mining , 2003 .

[7]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[8]  Raymond Chi-Wing Wong,et al.  Privacy-Preserving Data Publishing: An Overview , 2010, Privacy-Preserving Data Publishing: An Overview.

[9]  Jie Wu,et al.  A Two-Stage Deanonymization Attack against Anonymized Social Networks , 2014, IEEE Transactions on Computers.

[10]  Maria E. Orlowska,et al.  A Further Study on Inverse Frequent Set Mining , 2005, ADMA.

[11]  Stratis Ioannidis,et al.  Linear Regression as a Non-cooperative Game , 2013, WINE.

[12]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[13]  Iordanis Koutsopoulos,et al.  A Game Theoretic Framework for Data Privacy Preservation in Recommender Systems , 2011, ECML/PKDD.

[14]  Laks V. S. Lakshmanan,et al.  Anonymizing moving objects: how to hide a MOB in a crowd? , 2009, EDBT '09.

[15]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[16]  Vishal Bhatnagar,et al.  Anonymisation in social network: a literature survey and classification , 2012, Int. J. Soc. Netw. Min..

[17]  Ming-Syan Chen,et al.  On the Design and Analysis of the Privacy-Preserving SVM Classifier , 2011, IEEE Transactions on Knowledge and Data Engineering.

[18]  Basit Shafiq,et al.  Differentially Private Naive Bayes Classification , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[19]  Fan Yang,et al.  Automatic detection of rumor on Sina Weibo , 2012, MDS '12.

[20]  Ning Zhang,et al.  A New Privacy Preserving Association Rule Mining Algorithm Based on Hybrid Partial Hiding Strategy , 2013 .

[21]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[22]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[23]  Yuhong Guo Reconstruction-Based Association Rule Hiding , 2007 .

[24]  Yücel Saygin,et al.  Towards trajectory anonymization: a generalization-based approach , 2008, SPRINGL '08.

[25]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[26]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[27]  Mohammad Naderi Dehkordi,et al.  Privacy Preserving in Association Rule Mining , 2015 .

[28]  Benjamin C. M. Fung,et al.  Anonymizing trajectory data for passenger flow analysis , 2014 .

[29]  Yang Jing,et al.  A Method for Individualized Privacy Preservation , 2013 .

[30]  M. B. Malik,et al.  Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects , 2012, 2012 Third International Conference on Computer and Communication Technology.

[31]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[32]  Jens H. Weber,et al.  Privacy Preserving Decision Tree Learning Using Unrealized Data Sets , 2012, IEEE Transactions on Knowledge and Data Engineering.

[33]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[34]  K. Vijayalakshmi,et al.  A novel privacy preserving decision tree induction , 2013, 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES.

[35]  Renée J. Miller,et al.  Provenance for Data Mining , 2013, TaPP.

[36]  Jian Pei,et al.  A brief survey on anonymization techniques for privacy preserving publishing of social network data , 2008, SKDD.

[37]  Laks V. S. Lakshmanan,et al.  Trajectory anonymity in publishing personal mobility data , 2011, SKDD.

[38]  Vitaly Shmatikov,et al.  Privacy-Preserving Classifier Learning , 2009, Financial Cryptography.

[39]  Ljiljana Brankovic,et al.  PRIVACY ISSUES IN KNOWLEDGE DISCOVERY AND DATA MINING , 2000 .

[40]  Yücel Saygin,et al.  Ensuring location diversity in privacy-preserving spatio-temporal data publishing , 2013, The VLDB Journal.

[41]  Philip S. Yu,et al.  Structural Diversity for Resisting Community Identification in Published Social Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[42]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[43]  Miriam J. Metzger Making sense of credibility on the Web: Models for evaluating online information and recommendations for future research , 2007, J. Assoc. Inf. Sci. Technol..

[44]  Olaf Hartig Provenance Information in the Web of Data , 2009, LDOW.

[45]  Stefanos Gritzalis,et al.  Privacy Preserving Tree Augmented Naïve Bayesian Multi-party Implementation on Horizontally Partitioned Databases , 2011, TrustBus.

[46]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[47]  Animesh Tripathy,et al.  A Secure Two Party Hierarchical Clustering Approach for Vertically Partitioned Data Set with Accuracy Measure , 2013, ISI.

[48]  Ken C. K. Lee,et al.  High utility K-anonymization for social network publishing , 2013, Knowledge and Information Systems.

[49]  Somesh Jha,et al.  Privacy Preserving Clustering , 2005, ESORICS.

[50]  Yongge Wang,et al.  Approximate inverse frequent itemset mining: privacy, complexity, and approximation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[51]  Lei Chen,et al.  A Survey of Privacy-Preservation of Graphs and Social Networks , 2010, Managing and Mining Graph Data.

[52]  Philip S. Yu,et al.  Privacy-preserving social network publication against friendship attacks , 2011, KDD.

[53]  Tamir Tassa,et al.  k-Anonymization with Minimal Loss of Information , 2009, IEEE Transactions on Knowledge and Data Engineering.

[54]  Vassilios S. Verykios Association rule hiding methods , 2009, Encyclopedia of Data Warehousing and Mining.

[55]  Philip S. Yu,et al.  Identity Protection in Sequential Releases of Dynamic Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[56]  Francesco Bonchi,et al.  Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[57]  Mohammad Naderi Dehkordi A Novel Association Rule Hiding Approach in OLAP Data Cubes , 2013 .

[58]  Nikos Mamoulis,et al.  Privacy Preservation in the Publication of Trajectories , 2008, The Ninth International Conference on Mobile Data Management (mdm 2008).

[59]  A. M. Natarajan,et al.  An Effective Data Transformation Approach for Privacy Preserving Clustering , 2008 .

[60]  Yu-Han Lyu,et al.  Approximately optimal auctions for selling privacy when costs are correlated with data , 2012, EC '12.

[61]  Dhiren R. Patel,et al.  Maintaining privacy and data quality in privacy preserving association rule mining , 2010, 2010 Second International conference on Computing, Communication and Networking Technologies.

[62]  David C. Parkes,et al.  Iterative combinatorial auctions: achieving economic and computational efficiency , 2001 .

[63]  Luc Moreau,et al.  The Foundations for Provenance on the Web , 2010, Found. Trends Web Sci..

[64]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[65]  Miroslav Tudjman,et al.  Information Science: Science about Information, Misinformation and Disinformation , 2003 .

[66]  A. Ramamohan Reddy,et al.  Privacy Preserving in Association Rule Mining by Data Distortion Using PSO , 2014 .

[67]  Aaron Roth,et al.  Selling privacy at auction , 2010, EC '11.

[68]  Ke Wang,et al.  Privacy Risk in Graph Stream Publishing for Social Network Data , 2011, 2011 IEEE 11th International Conference on Data Mining.

[69]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[70]  Klaus R. Dittrich,et al.  Data Provenance: A Categorization of Existing Approaches , 2007, BTW.

[71]  Jaideep Vaidya,et al.  Knowledge and Information Systems , 2007 .

[72]  Shiwei Tang,et al.  A FP-Tree-Based Method for Inverse Frequent Set Mining , 2006, BNCOD.

[73]  Yanchun Zhang,et al.  Equally contributory privacy-preserving k-means clustering over vertically partitioned data , 2013, Inf. Syst..

[74]  David Hylender,et al.  Data Breach Investigations Report , 2011 .

[75]  Atsuko Miyaji,et al.  Title Privacy-Preserving Data Mining : A Game-theoretic Approach , 2012 .

[76]  Philip S. Yu,et al.  Privacy Preserving Social Network Publication against Mutual Friend Attacks , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[77]  Somjit Arch-int,et al.  Association rule hiding in risk management for retail supply chain collaboration , 2013, Comput. Ind..

[78]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[79]  Raphael C.-W. Phan,et al.  Vickrey-Clarke-Groves for privacy-preserving collaborative classification , 2013, 2013 Federated Conference on Computer Science and Information Systems.

[80]  Li Yan,et al.  Privacy-preserving distributed association rule mining based on the secret sharing technique , 2010, The 2nd International Conference on Software Engineering and Data Mining.

[81]  Ken Barker,et al.  A Negotiation Game: Establishing Stable Privacy Policies for Aggregate Reasoning , 2012 .

[82]  Benjamin C. M. Fung,et al.  Privacy-preserving trajectory data publishing by local suppression , 2013, Inf. Sci..

[83]  Mohammad Naderi Dehkordi,et al.  A survey on privacy preserving association rule mining , 2015 .

[84]  Spiros Skiadopoulos,et al.  Distance-Based k^m-Anonymization of Trajectory Data , 2013, 2013 IEEE 14th International Conference on Mobile Data Management.

[85]  Marina Blanton,et al.  Secure Multiparty Computation , 2011, Encyclopedia of Cryptography and Security.

[86]  N. S. Chaudhari,et al.  Privacy preserving association rule mining by introducing concept of impact factor , 2012, 2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA).

[87]  Hongyan Liu,et al.  Detecting Event Rumors on Sina Weibo Automatically , 2013, APWeb.

[88]  Chris Clifton,et al.  Privacy-preserving Naïve Bayes classification , 2008, The VLDB Journal.

[89]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[90]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[91]  V. Valli Kumari,et al.  A Coalitional Game Theoretic Mechanism for Privacy Preserving Publishing Based on k-Anonymity , 2012 .

[92]  Stan Matwin,et al.  Privacy-Preserving Data Mining Techniques: Survey and Challenges , 2013, Discrimination and Privacy in the Information Society.

[93]  Daniël Wedema Games And Information An Introduction To Game Theory 3rd Edition , 2011 .

[94]  Aaron Roth,et al.  Take It or Leave It: Running a Survey When Privacy Comes at a Cost , 2012, WINE.

[95]  Kobbi Nissim,et al.  Redrawing the boundaries on purchasing data from privacy-sensitive individuals , 2014, ITCS.

[96]  Kun Liu,et al.  Multi-party, Privacy-Preserving Distributed Data Mining Using a Game Theoretic Framework , 2007, PKDD.

[97]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[98]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[99]  Xiangtao Li,et al.  Structural Attack to Anonymous Graph of Social Networks , 2013 .

[100]  Jemal H. Abawajy,et al.  Attack Vector Analysis and Privacy-Preserving Social Network Data Publishing , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[101]  Dan Suciu,et al.  Reverse data management , 2011, Proc. VLDB Endow..

[102]  Balachander Krishnamurthy,et al.  For sale : your data: by : you , 2011, HotNets-X.

[103]  Keita Emura,et al.  Privacy-Preserving Two-Party k-Means Clustering in Malicious Model , 2013, 2013 IEEE 37th Annual Computer Software and Applications Conference Workshops.

[104]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[105]  Jules Polonetsky,et al.  To Track or 'Do Not Track': Advancing Transparency and Individual Control in Online Behavioral Advertising , 2011 .

[106]  Dhyanendra Jain Hiding Sensitive Association Rules without Altering the Support of Sensitive Item(s) , 2012, ArXiv.

[107]  Shouhuai Xu,et al.  Privacy-Preserving Decision Tree Mining Based on Random Substitutions , 2006, ETRICS.

[108]  Yücel Saygin,et al.  Privacy preserving association rule mining , 2002, Proceedings Twelfth International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems RIDE-2EC 2002.

[109]  Frank Dürr,et al.  A classification of location privacy attacks and approaches , 2012, Personal and Ubiquitous Computing.

[110]  Jian Xu,et al.  Utility-based anonymization for privacy preservation with less information loss , 2006, SKDD.

[111]  Murat Kantarcioglu,et al.  Incentive Compatible Privacy-Preserving Distributed Classification , 2012, IEEE Transactions on Dependable and Secure Computing.

[112]  Murat Kantarcioglu,et al.  Incentive Compatible Privacy-Preserving Data Analysis , 2013, IEEE Transactions on Knowledge and Data Engineering.