Simultaneous Pattern and Data Hiding in Unsupervised Learning

How to control the level of knowledge disclosure and se- cure certain confidential patterns is a subtask comparable to confidential data hiding in privacy preserving data min- ing. We propose a technique to simultaneously hide data values and confidential patterns without undesirable side effects on distorting nonconfidential patterns. We use non- negative matrix factorization technique to distort the origi- nal dataset and preserve its overall characteristics. A fac- tor swapping method is designed to hide particular confi- dential patterns for k-means clustering. The effectiveness of this novel hiding technique is examined on a benchmark dataset. Experimental results indicate that our technique can produce a single modified dataset to achieve both pat- tern and data value hiding. Under certain constraints on the nonnegative matrix factorization iterations, an optimal solution can be computed in which the user-specified con- fidential memberships or relationships are hidden without undesirable alterations on nonconfidential patterns.

[1]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[2]  Jie Wang,et al.  NNMF-Based Factorization Techniques for High-Accuracy Privacy Protection on Non-negative-valued Datasets , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[3]  Wuu Yang,et al.  Identifying syntactic differences between two programs , 1991, Softw. Pract. Exp..

[4]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[5]  Jie Wang,et al.  Knowledge and Information Systems REGULAR PAPER , 2006 .

[6]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Hector Garcia-Molina,et al.  Extracting structured data from Web pages , 2003, SIGMOD '03.

[8]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[9]  Arbee L. P. Chen,et al.  Hiding Sensitive Association Rules with Limited Side Effects , 2007 .

[10]  Daniel E. O'Leary,et al.  Knowledge Discovery as a Threat to Database Security , 1991, Knowledge Discovery in Databases.

[11]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[12]  Chun-Nan Hsu,et al.  Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web , 1998, Inf. Syst..

[13]  Chris Clifton,et al.  SECURITY AND PRIVACY IMPLICATIONS OF DATA MINING , 1996 .