Achieving Privacy-Utility Trade-off in existing Software Systems

Privacy and utility of data are two aspects of a system that are often diagonally opposite to each other. Privacy concerns drive design decisions that can reduce the ability to make deductions or correlations from a given dataset (e.g. reducing the probability that an individual could be recognized from a given set of health records). Utility on the other hand tries to maximize the chances of nding helpful relationships in the real world, that can then be used for making smarter systems (e.g. the ability to predict that an individual is at higher risk of being a ected by a terminal disease). A term that is often used to explain this paradox is called the Privacy-Utility trade-o . Software practitioners have often ignored the privacy aspects due to lack of legal obligations, and have generally concentrated on achieving functionality. But with a renewed interest in Arti cial Intelligence, privacy concerns are going to become more important in near future. This will force the software providers to reevaluate their existing products and services from a privacy perspective. In this work, we analyse some of the challenges that a typical software provider would face while doing so. We present a privacy model that can be applied to existing systems, which in turn can suggest rst-cut privacy solutions requiring minimal alterations in deployed applications. To the best of our knowledge, no open-source initiative has been started till now to cater to these requirements. We brie y introduce the prototype of an open-source tool that we are developing which is aimed at facilitating this analysis. The initial results were obtained over some standard datasets, as well as a real world credit card fraud dataset, which seemed to collate with our intuitions.

[1]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[2]  Md Zahidul Islam,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Differentially Private Random Decision Forests Using Smooth Sensitivity , 2022 .

[3]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.

[4]  Yin Yang,et al.  Functional Mechanism: Regression Analysis under Differential Privacy , 2012, Proc. VLDB Endow..

[5]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[6]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[7]  Murat Kantarcioglu,et al.  Privacy-aware dynamic feature selection , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[8]  Latanya Sweeney,et al.  Datafly: A System for Providing Anonymity in Medical Data , 1997, DBSec.

[9]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Fabio Martinelli,et al.  Privacy-Utility Feature Selection as a Privacy Mechanism in Collaborative Data Classification , 2017, 2017 IEEE 26th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE).

[11]  Ali Makhdoumi,et al.  Privacy-utility tradeoff under statistical uncertainty , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[13]  Meng Han,et al.  Privacy preserving feature selection and Multiclass Classification for horizontally distributed data , 2018, Math. Found. Comput..

[14]  H. Vincent Poor,et al.  Utility-Privacy Tradeoffs in Databases: An Information-Theoretic Approach , 2011, IEEE Transactions on Information Forensics and Security.

[15]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[16]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[17]  Yücel Saygin,et al.  Differentially private nearest neighbor classification , 2017, Data Mining and Knowledge Discovery.

[18]  Yun Li,et al.  Differentially private feature selection , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[19]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[20]  Adam D. Smith,et al.  Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso , 2013, COLT.

[21]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[22]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[23]  Michael Bächle,et al.  Ruby on Rails , 2006, Softwaretechnik-Trends.

[24]  Ninghui Li,et al.  On the tradeoff between privacy and utility in data publishing , 2009, KDD.

[25]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.

[26]  Athos Antoniades,et al.  Privacy preserving data publishing of categorical data through k-anonymity and feature selection. , 2016, Healthcare technology letters.

[27]  Zhe Wang,et al.  A sparsity based GLRT for moving target detection in distributed MIMO radar on moving platforms , 2015, 2015 49th Asilomar Conference on Signals, Systems and Computers.

[28]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[29]  Murat Kantarcioglu,et al.  Optimizing secure classification performance with privacy-aware feature selection , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[30]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[31]  Xintao Wu,et al.  DPWeka: Achieving Differential Privacy in WEKA , 2017, 2017 IEEE Symposium on Privacy-Aware Computing (PAC).

[32]  Vinay P. Namboodiri,et al.  PUTWorkbench: Analysing Privacy in AI-intensive Systems , 2019, ArXiv.

[33]  Tobias J. Oechtering,et al.  Privacy-Aware Distributed Bayesian Detection , 2015, IEEE Journal of Selected Topics in Signal Processing.

[34]  G Aghila,et al.  A Privacy-Preserving Feature Extraction Method for Big Data Analytics Based on Data-Independent Reusable Projection , 2021, Research Anthology on Privatizing and Securing Data.

[35]  Elaine Shi,et al.  GUPT: privacy preserving data analysis made easy , 2012, SIGMOD Conference.

[36]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).