A proposal of a privacy-preserving questionnaire by non-deterministic information and its analysis

We focus on a questionnaire consisting of three-choice question or multiple-choice question, and propose a privacy-preserving questionnaire by non-deterministic information. Each respondent usually answers one choice from the multiple choices, and each choice is stored as a tuple in a table data. The organizer of this questionnaire analyzes the table data set, and obtains rules and the tendency. If this table data set contains personal information, the organizer needs to employ the analytical procedures with the privacy-preserving functionality. In this paper, we propose a new framework that each respondent intentionally answers non-deterministic information instead of deterministic information. For example, he answers ‘either A, B, or C’ instead of the actual choice A, and he intentionally dilutes his choice. This may be the similar concept on the k-anonymity. Non-deterministic information will be desirable for preserving each respondent's information. We follow the framework of Rough Non-deterministic Information Analysis (RNIA), and apply RNIA to the privacy-preserving questionnaire by non-deterministic information. In the current data mining algorithms, the tuples with non-deterministic information may be removed based on the data cleaning process. However, RNIA can handle such tuples as well as the tuples with deterministic information. By using RNIA, we can consider new types of privacy-preserving questionnaire.

[1]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[2]  Witold Lipski,et al.  On semantic issues connected with incomplete information databases , 1979, ACM Trans. Database Syst..

[3]  Hiroshi Sakai,et al.  Rules and Apriori Algorithm in Non-deterministic Information Systems , 2006, Trans. Rough Sets.

[4]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[5]  Jerzy W. Grzymala-Busse,et al.  An Analysis of Probabilistic Approximations for Rule Induction from Incomplete Data Sets , 2014, Fundam. Informaticae.

[6]  Shusaku Tsumoto,et al.  Knowledge discovery in clinical databases and evaluation of discovered knowledge in outpatient clinic , 2000, Inf. Sci..

[7]  Zdzisław Pawlak,et al.  Systemy Informacyjne. Podstawy Teoretyczne , 1983 .

[8]  Chenxi Liu,et al.  On NIS-Apriori Based Data Mining in SQL , 2016, IJCRS.

[9]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[10]  Shusaku Tsumoto,et al.  Automated extraction of hierarchical decision rules from clinical databases using rough set model , 2003, Expert Syst. Appl..

[11]  Hiroshi Sakai,et al.  Twofold rough approximations under incomplete information , 2013, Int. J. Gen. Syst..

[12]  Kenji Fujimoto,et al.  Privacy preserving Web-based questionnaire , 2005, 19th International Conference on Advanced Information Networking and Applications (AINA'05) Volume 1 (AINA papers).

[13]  Hung Son Nguyen,et al.  Rough Set Methods for Large and Spare Data in EAV Format , 2012, 2012 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future.

[14]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[15]  Ewa Orlowska,et al.  Representation of Nondeterministic Information , 1984, Theor. Comput. Sci..

[16]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[17]  Dominik Slezak,et al.  Automatic Extraction of Decision Rules from Non-deterministic Data Systems: Theoretical Foundations and SQL-Based Implementation , 2009, FGIT-DTA.

[18]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[19]  Hiroshi Sakai,et al.  An Overview of the getRNIA System for Non-deterministic Data , 2013, KES.

[20]  Witold Lipski,et al.  On Databases with Incomplete Information , 1981, JACM.

[21]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[22]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[23]  Hiroshi Sakai,et al.  Apriori-Based Rule Generation in Incomplete Information Databases and Non-Deterministic Information Systems , 2014, Fundam. Informaticae.

[24]  Stefan Gottschalk,et al.  Privacy Preserving Data Mining Models And Algorithms , 2016 .

[25]  Dan Bogdanov,et al.  A prototype of online privacy-preserving questionnaire system , 2010 .

[26]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .