Knowledge as a service and knowledge breaching

In this paper, we introduce and explore a new computing paradigm we call knowledge as a service, in which a knowledge service provider, via its knowledge server, answers queries presented by some knowledge consumers. The knowledge server's answers are based on knowledge models that may be expensive or impossible to obtain for the knowledge consumers. While this new paradigm of computing is promising, we must establish a solid foundation to ensure its utility. We focus on the security aspect of the paradigm, and particularly on the problem we call knowledge breaching attack, which may allow an adversary to recover the knowledge underlying a knowledge service. Without being able to adequately handling such an attack, the knowledge service providers would never have any economic incentives to develop such a paradigm. Unfortunately, this paper theoretically shows that any interesting knowledge is subject to the knowledge breaching attack, and empirically shows that some knowledge models could be breached after a very small number of queries (e.g., 0.2-]% portion of the domain). Thus we need to investigate technical means that can alleviate such powerful attacks (at least for most practical knowledge models).

[1]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[2]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[3]  K. Ramachandran,et al.  Mathematical Statistics with Applications. , 1992 .

[4]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[5]  Venkatesan Guruswami,et al.  Query strategies for priced information (extended abstract) , 2000, STOC '00.

[6]  Teresa F. Lunt,et al.  Current Issues in Statistical Database Security , 1991, Database Security.

[7]  Joan Feigenbaum,et al.  Hiding Instances in Zero-Knowledge Proof Systems (Extended Abstract) , 1990, CRYPTO.

[8]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[9]  David Chaum,et al.  Untraceable electronic mail, return addresses, and digital pseudonyms , 1981, CACM.

[10]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[11]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[12]  William E. Winkler,et al.  Masking and Re-identification Methods for Public-Use Microdata: Overview and Research Problems , 2004, Privacy in Statistical Databases.

[13]  Sunita Sarawagi,et al.  Data mining models as services on the internet , 2000, SKDD.

[14]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[15]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[16]  David Chaum,et al.  Blind Signatures for Untraceable Payments , 1982, CRYPTO.

[17]  Dietrich Wettschereck,et al.  Exchanging Data Mining Models with the Predictive Modelling Markup Language , 2001 .

[18]  Hakan Hacigümüs,et al.  Providing database as a service , 2002, Proceedings 18th International Conference on Data Engineering.

[19]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[20]  Dorothy E. Denning,et al.  Cryptography and Data Security , 1982 .

[21]  David Chaum,et al.  Security without identification: transaction systems to make big brother obsolete , 1985, CACM.