Privacy-preserving publishing of opinion polls

Public opinion is the belief or thoughts of the public regarding a particular topic, especially one regarding politics, religion or social issues. Opinions may be sensitive since they may reflect a person's perspective, understanding, particular feelings, way of life, and desires. On one hand, public opinion is often collected through a central server which keeps a user profile for each participant and needs to publish this data for research purposes. On the other hand, such publishing of sensitive information without proper de-identification puts individuals' privacy at risk, thus opinions must be anonymized prior to publishing. While many anonymization approaches for tabular data with single sensitive attribute have been introduced, the proposed approaches do not readily apply to opinion polls. This is because opinions are generally collected on many issues, thus opinion databases have multiple sensitive attributes. Finding and enforcing anonymization models that work on datasets with multiple sensitive attributes while allowing risk analysis on the publisher side is not a well-studied problem. In this work, we identify the privacy problems regarding public opinions and propose a new probabilistic privacy model MSA-diversity, specifically defined on datasets with multiple sensitive attributes. We also present a heuristic anonymization technique to enforce MSA-diversity. Experimental results on real data show that our approach clearly outperforms the existing approaches in terms of anonymization accuracy.

[1]  Subariah Ibrahim,et al.  Secure E-voting with blind signature , 2003, 4th National Conference of Telecommunication Technology, 2003. NCTT 2003 Proceedings..

[2]  Aryya Gangopadhyay,et al.  A Privacy Protection Model for Patient Data with Multiple Sensitive Attributes , 2008, Int. J. Inf. Secur. Priv..

[3]  Kazue Sako,et al.  Receipt-Free Mix-Type Voting Scheme - A Practical Solution to the Implementation of a Voting Booth , 1995, EUROCRYPT.

[4]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[5]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[6]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[7]  Byoungcheon Lee,et al.  New receipt-free voting scheme using double-trapdoor commitment , 2011, Inf. Sci..

[8]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Benjamin I. Page,et al.  Effects of Public Opinion on Policy , 1983, American Political Science Review.

[10]  Sheng Zhong,et al.  k-Anonymous data collection , 2009, Inf. Sci..

[11]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[12]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Yang Xiao Privacy Preserving Approaches for Multiple Sensitive Attributes in Data Publishing , 2008 .

[14]  Byoungcheon Lee,et al.  Multiplicative Homomorphic E-Voting , 2004, INDOCRYPT.

[15]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[16]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[17]  Ulf Hjelmar,et al.  Public opinion polling in a globalized world , 2008 .

[18]  Lior Rokach,et al.  Privacy-preserving data mining: A feature set partitioning approach , 2010, Inf. Sci..

[19]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[20]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[21]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[22]  Chris Clifton,et al.  Multirelational k-Anonymity , 2009, IEEE Trans. Knowl. Data Eng..

[23]  Kuo-Liang Chung,et al.  Efficient algorithms for coding Hilbert curve of arbitrary-sized image and application to window query , 2007, Inf. Sci..

[24]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[25]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[26]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[28]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[29]  Zhen Li,et al.  Privacy Protection on Multiple Sensitive Attributes , 2007, ICICS.

[30]  Cristina Nita-Rotaru,et al.  A survey of attack and defense techniques for reputation systems , 2009, CSUR.

[31]  Nicholas Hopper,et al.  k-anonymous message transmission , 2003, CCS '03.

[32]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.