论文信息 - Privacy: a machine learning view

Privacy: a machine learning view

The problem of disseminating a data set for machine learning while controlling the disclosure of data source identity is described using a commuting diagram of functions. This formalization is used to present and analyze an optimization problem balancing privacy and data utility requirements. The analysis points to the application of a generalization mechanism for maintaining privacy in view of machine learning needs. We present new proofs of NP-hardness of the problem of minimizing information loss while satisfying a set of privacy requirements, both with and without the addition of a particular uniform coding requirement. As an initial analysis of the approximation properties of the problem, we show that the cell suppression problem with a constant number of attributes can be approximated within a constant. As a side effect, proofs of NP-hardness of the minimum k-union, maximum k-intersection, and parallel versions of these are presented. Bounded versions of these problems are also shown to be approximable within a constant.

Staal A. Vinterbo | S. Vinterbo

[1] William E. Winkler,et al. Using Simulated Annealing for k-anonymity , 2002 .

[2] Latanya Sweeney,et al. Guaranteeing anonymity when sharing medical data, the Datafly System , 1997, AMIA.

[3] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[4] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[5] Pierangela Samarati,et al. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[6] L. Cox. Suppression Methodology and Statistical Disclosure Control , 1980 .

[7] Alexandre V. Evfimievski,et al. Privacy preserving mining of association rules , 2002, Inf. Syst..

[8] Murat Kantarcioglu,et al. An architecture for privacy-preserving mining of client information , 2002 .

[9] Giorgio Gambosi,et al. Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[10] Stan Matwin,et al. Privacy-Oriented Data Mining by Proof Checking , 2002, PKDD.

[11] Sushil Jajodia,et al. Secure Databases: Constraints, Inference Channels, and Monitoring Disclosures , 2000, IEEE Trans. Knowl. Data Eng..

[12] Lucila Ohno-Machado,et al. Using Boolean reasoning to anonymize databases , 1999, Artif. Intell. Medicine.

[13] Barbara Kostrewski,et al. Biomedical information: education and decision support systems , 1986, J. Inf. Sci..

[14] Dorothy E. Denning,et al. Secure statistical databases with random sample queries , 1980, TODS.

[15] Gultekin Özsoyoglu,et al. Controlling FD and MVD Inferences in Multilevel Relational Database Systems , 1991, IEEE Trans. Knowl. Data Eng..

[16] Matteo Fischetti,et al. Models and algorithms for the 2-dimensional cell suppression problem in statistical disclosure control , 1999, Math. Program..

[17] Chris Clifton,et al. Using Sample Size to Limit Exposure to Data Mining , 2000, J. Comput. Secur..

[18] Robert J. Schalkoff,et al. Pattern recognition - statistical, structural and neural approaches , 1991 .

[19] Yehuda Lindell,et al. Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[20] Andrzej Skowron,et al. The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[21] Charu C. Aggarwal,et al. On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[22] Pierangela Samarati,et al. Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[23] Gu Si-yang,et al. Privacy preserving association rule mining in vertically partitioned data , 2006 .

[24] Vassilios S. Verykios,et al. Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[25] G. Rushton,et al. Geographically masking health data to preserve confidentiality. , 1999, Statistics in medicine.

[26] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[27] Lucila Ohno-Machado,et al. Effects of Data Anonymization by Cell Suppression on Descriptive Statistics and Predictive Modeling Performance , 2002, J. Am. Medical Informatics Assoc..

[28] Ljiljana Brankovic,et al. Data Swapping: Balancing Privacy against Precision in Mining for Logic Rules , 1999, DaWaK.

[29] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[30] Vijay S. Iyengar,et al. Transforming data to satisfy privacy constraints , 2002, KDD.