Big Privacy: Challenges and Opportunities of Privacy Study in the Age of Big Data

One of the biggest concerns of big data is privacy. However, the study on big data privacy is still at a very early stage. We believe the forthcoming solutions and theories of big data privacy root from the in place research output of the privacy discipline. Motivated by these factors, we extensively survey the existing research outputs and achievements of the privacy field in both application and theoretical angles, aiming to pave a solid starting ground for interested readers to address the challenges in the big data case. We first present an overview of the battle ground by defining the roles and operations of privacy systems. Second, we review the milestones of the current two major research categories of privacy: data clustering and privacy frameworks. Third, we discuss the effort of privacy study from the perspectives of different disciplines, respectively. Fourth, the mathematical description, measurement, and modeling on privacy are presented. We summarize the challenges and opportunities of this promising topic at the end of this paper, hoping to shed light on the exciting and almost uncharted land.

[1]  Véronique Cortier,et al.  Measuring vote privacy, revisited , 2012, CCS.

[2]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  Michael K. Reiter,et al.  Crowds: anonymity for Web transactions , 1998, TSEC.

[4]  Ninghui Li,et al.  Membership privacy: a unifying framework for privacy definitions , 2013, CCS.

[5]  Lili Qiu,et al.  Statistical identification of encrypted Web browsing traffic , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[6]  Patrick Wendell,et al.  Going viral: flash crowds in an open CDN , 2011, IMC '11.

[7]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[8]  Dr B Santhosh Kumar Santhosh Balan,et al.  Closeness : A New Privacy Measure for Data Publishing , 2022 .

[9]  Quanyan Zhu,et al.  Game theory meets network security and privacy , 2013, CSUR.

[10]  David C. Parkes,et al.  Non-Cooperative Location Privacy , 2013, IEEE Transactions on Dependable and Secure Computing.

[11]  Tianqing Zhu,et al.  Correlated Differential Privacy: Hiding Information in Non-IID Data Set , 2015, IEEE Transactions on Information Forensics and Security.

[12]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[13]  Martin J. Wainwright,et al.  Privacy Aware Learning , 2012, JACM.

[14]  Vinod Vaikuntanathan,et al.  Computing Blindfolded: New Developments in Fully Homomorphic Encryption , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[15]  John G. Matsusaka,et al.  Conservative or Liberal , 2004 .

[16]  Rafail Ostrovsky,et al.  Searchable symmetric encryption: Improved definitions and efficient constructions , 2011, J. Comput. Secur..

[17]  H. Vincent Poor,et al.  Smart Meter Privacy: A Theoretical Framework , 2013, IEEE Transactions on Smart Grid.

[18]  Paul F. Syverson,et al.  Anonymous connections and onion routing , 1997, Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No.97CB36097).

[19]  Elisa Bertino,et al.  IdentiDroid: Android can finally Wear its Anonymous Suit , 2014, Trans. Data Priv..

[20]  Yu-Han Lyu,et al.  Approximately optimal auctions for selling privacy when costs are correlated with data , 2012, EC '12.

[21]  Mauro Barni,et al.  Privacy Protection in Biometric-Based Recognition Systems , 2015 .

[22]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[23]  Aaron Roth,et al.  Take It or Leave It: Running a Survey When Privacy Comes at a Cost , 2012, WINE.

[24]  K. J. Ray Liu,et al.  Privacy or Utility in Data Collection? A Contract Theoretic Approach , 2015, IEEE Journal of Selected Topics in Signal Processing.

[25]  G. King,et al.  Ensuring the Data-Rich Future of the Social Sciences , 2011, Science.

[26]  Lang Tong,et al.  Anonymous Networking Amidst Eavesdroppers , 2008, IEEE Transactions on Information Theory.

[27]  Ronald L. Rivest,et al.  ON DATA BANKS AND PRIVACY HOMOMORPHISMS , 1978 .

[28]  David Chaum,et al.  Untraceable electronic mail, return addresses, and digital pseudonyms , 1981, CACM.

[29]  Chris Clifton,et al.  Differential identifiability , 2012, KDD.

[30]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[31]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[32]  Charles V. Wright,et al.  On Inferring Application Protocol Behaviors in Encrypted Network Traffic , 2006, J. Mach. Learn. Res..

[33]  G. Loewenstein,et al.  Privacy and human behavior in the age of information , 2015, Science.

[34]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[35]  Pramod Viswanath,et al.  The Staircase Mechanism in Differential Privacy , 2015, IEEE Journal of Selected Topics in Signal Processing.

[36]  Chunxiao Jiang,et al.  Information Security in Big Data: Privacy and Data Mining , 2014, IEEE Access.

[37]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[38]  Allison Bishop,et al.  Decentralizing Attribute-Based Encryption , 2011, IACR Cryptol. ePrint Arch..

[39]  Julien Bringer,et al.  Privacy-Preserving Biometric Identification Using Secure Multiparty Computation: An Overview and Recent Trends , 2013, IEEE Signal Processing Magazine.

[40]  Yuguang Fang,et al.  A game-theoretic approach for achieving k-anonymity in Location Based Services , 2013, 2013 Proceedings IEEE INFOCOM.

[41]  E. Horvitz,et al.  Data, privacy, and the greater good , 2015, Science.

[42]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[43]  Josep Domingo-Ferrer,et al.  From t-Closeness-Like Privacy to Postrandomization via Information Theory , 2010, IEEE Transactions on Knowledge and Data Engineering.

[44]  Ye Zhu,et al.  Correlation-Based Traffic Analysis Attacks on Anonymity Networks , 2010, IEEE Transactions on Parallel and Distributed Systems.

[45]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[46]  Aaron Roth,et al.  Privacy and mechanism design , 2013, SECO.

[47]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[48]  J. Overpeck,et al.  Climate Data Challenges in the 21st Century , 2011, Science.

[49]  Weifeng Liu,et al.  Correntropy: Properties and Applications in Non-Gaussian Signal Processing , 2007, IEEE Transactions on Signal Processing.

[50]  Bülent Yener,et al.  On anonymity in an electronic society: A survey of anonymous communication systems , 2009, CSUR.

[51]  Wanlei Zhou,et al.  Discriminating DDoS Flows from Flash Crowds Using Information Distance , 2009, 2009 Third International Conference on Network and System Security.

[52]  David A. Wagner,et al.  Towards a privacy measurement criterion for voting systems , 2005, DG.O.

[53]  H. Vincent Poor,et al.  Privacy–Security Trade-Offs in Biometric Security Systems—Part II: Multiple Use Case , 2011, IEEE Transactions on Information Forensics and Security.

[54]  Brian Neil Levine,et al.  Inferring the source of encrypted HTTP connections , 2006, CCS '06.

[55]  Véronique Cortier,et al.  SoK: A Comprehensive Analysis of Game-Based Ballot Privacy Definitions , 2015, 2015 IEEE Symposium on Security and Privacy.

[56]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[57]  H. Vincent Poor,et al.  Privacy–Security Trade-Offs in Biometric Security Systems—Part I: Single Use Case , 2011, IEEE Transactions on Information Forensics and Security.

[58]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[59]  Vikram Krishnamurthy,et al.  A Tutorial on Interactive Sensing in Social Networks , 2014, IEEE Transactions on Computational Social Systems.

[60]  Anne-Marie Kermarrec,et al.  Heterogeneous Differential Privacy , 2015, J. Priv. Confidentiality.

[61]  Brent Waters,et al.  Fuzzy Identity-Based Encryption , 2005, EUROCRYPT.

[62]  Isaac L. Chuang,et al.  Demonstrating the viability of universal quantum computation using teleportation and single-qubit operations , 1999, Nature.

[63]  Elham Kashefi,et al.  Demonstration of Blind Quantum Computing , 2011, Science.

[64]  Mauro Barni,et al.  Privacy Protection in Biometric-Based Recognition Systems: A marriage between cryptography and signal processing , 2015, IEEE Signal Processing Magazine.

[65]  Trudie Lang,et al.  Advancing Global Health Research Through Digital Technology and Sharing Data , 2011, Science.

[66]  Ting Yu,et al.  Conservative or liberal? Personalized differential privacy , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[67]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[68]  Y. de Montjoye,et al.  Unique in the shopping mall: On the reidentifiability of credit card metadata , 2015, Science.

[69]  Shui Yu,et al.  Predicted Packet Padding for Anonymous Web Browsing Against Traffic Analysis Attacks , 2012, IEEE Transactions on Information Forensics and Security.

[70]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[71]  Dawn Xiaodong Song,et al.  Practical techniques for searches on encrypted data , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[72]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[73]  Claude E. Shannon,et al.  Communication theory of secrecy systems , 1949, Bell Syst. Tech. J..

[74]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[75]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[76]  Xi Fang,et al.  Truthful incentive mechanisms for k-anonymity location privacy , 2013, 2013 Proceedings IEEE INFOCOM.

[77]  Abraham L. Newman What the “right to be forgotten” means for privacy in a digital age , 2015, Science.

[78]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[79]  Lujo Bauer,et al.  More than skin deep: measuring effects of the underlying model on access-control system usability , 2011, CHI.

[80]  Aaron Roth,et al.  Selling privacy at auction , 2015, Games Econ. Behav..

[81]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[82]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[83]  Frans M. J. Willems,et al.  Fundamental Limits for Privacy-Preserving Biometric Identification Systems That Support Authentication , 2015, IEEE Transactions on Information Theory.

[84]  Arun Ross,et al.  What Else Does Your Biometric Data Reveal? A Survey on Soft Biometrics , 2016, IEEE Transactions on Information Forensics and Security.

[85]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[86]  Charles V. Wright,et al.  Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis , 2009, NDSS.

[87]  Brent Waters,et al.  Attribute-based encryption for fine-grained access control of encrypted data , 2006, CCS '06.

[88]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[89]  I. Chuang,et al.  Quantum Teleportation is a Universal Computational Primitive , 1999, quant-ph/9908010.

[90]  Mark Dredze,et al.  Machine learning:Trends, perspectives, and prospects , 2015 .

[91]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.