Trust-but-Verify: Verifying Result Correctness of Outsourced Frequent Itemset Mining in Data-Mining-As-a-Service Paradigm

Cloud computing is popularizing the computing paradigm in which data is outsourced to a third-party service provider (server) for data mining. Outsourcing, however, raises a serious security issue: how can the client of weak computational power verify that the server returned correct mining result? In this paper, we focus on the specific task of frequent itemset mining. We consider the server that is potentially untrusted and tries to escape from verification by using its prior knowledge of the outsourced data. We propose efficient probabilistic and deterministic verification approaches to check whether the server has returned correct and complete frequent itemsets. Our probabilistic approach can catch incorrect results with high probability, while our deterministic approach measures the result correctness with 100 percent certainty. We also design efficient verification methods for both cases that the data and the mining setup are updated. We demonstrate the effectiveness and efficiency of our methods using an extensive set of empirical results on real datasets.

[1]  Cristina V. Lopes,et al.  Efficient Verification of Web-Content Searching Through Authenticated Web Crawlers , 2012, Proc. VLDB Endow..

[2]  Craig Gentry,et al.  Non-interactive Verifiable Computing: Outsourcing Computation to Untrusted Workers , 2010, CRYPTO.

[3]  Feifei Li,et al.  Dynamic authenticated index structures for outsourced databases , 2006, SIGMOD Conference.

[4]  Vinod Vaikuntanathan,et al.  How to Delegate and Verify in Public: Verifiable Computation from Attribute-based Encryption , 2012, IACR Cryptol. ePrint Arch..

[5]  Nikos Mamoulis,et al.  An Audit Environment for Outsourcing of Frequent Itemset Mining , 2009, Proc. VLDB Endow..

[6]  Wendy Hui Wang,et al.  AUDIO: An Integrity Auditing Framework of Outlier-Mining-as-a-Service Systems , 2012, ECML/PKDD.

[7]  Yevgeniy Vahlis,et al.  Verifiable Delegation of Computation over Large Datasets , 2011, IACR Cryptol. ePrint Arch..

[8]  Roberto Tamassia,et al.  Authenticated hash tables , 2008, CCS.

[9]  Hakan Hacigümüs,et al.  Executing SQL over encrypted data in the database-service-provider model , 2002, SIGMOD '02.

[10]  Ming-Syan Chen,et al.  Power-law relationship and self-similarity in the itemset support distribution: analysis and applications , 2008, The VLDB Journal.

[11]  Leonid A. Levin,et al.  Checking computations in polylogarithmic time , 1991, STOC '91.

[12]  Philip S. Yu,et al.  Mining Colossal Frequent Patterns by Core Pattern Fusion , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[14]  Rosario Gennaro,et al.  Publicly verifiable delegation of large polynomials and matrix computations, with applications , 2012, IACR Cryptol. ePrint Arch..

[15]  Silvio Micali,et al.  The knowledge complexity of interactive proof-systems , 1985, STOC '85.

[16]  Alfred Menezes,et al.  Reducing elliptic curve logarithms to logarithms in a finite field , 1991, STOC '91.

[17]  Ralph C. Merkle,et al.  Protocols for Public Key Cryptosystems , 1980, 1980 IEEE Symposium on Security and Privacy.

[18]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[19]  Andrew J. Blumberg Toward Practical and Unconditional Verification of Remote Computations , 2011, HotOS.

[20]  Ninghui Li,et al.  On the (In)Security and (Im)Practicality of Outsourcing Precise Association Rule Mining , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[21]  Roberto Tamassia,et al.  Optimal Verification of Operations on Dynamic Sets , 2011, CRYPTO.

[22]  Ran Canetti,et al.  Practical delegation of computation using multiple servers , 2011, CCS '11.

[23]  Laks V. S. Lakshmanan,et al.  Privacy-Preserving Data Mining from Outsourced Databases , 2011, Computers, Privacy and Data Protection.