Integrity Verification of K-means Clustering Outsourced to Infrastructure as a Service (IaaS) Providers

The Cloud-based infrastructure-as-a-service (IaaS) paradigm (e.g., Amazon EC2) enables a client who lacks computational resources to outsource her dataset and data mining tasks to the Cloud. However, as the Cloud may not be fully trusted, it raises serious concerns about the integrity of the mining results returned by the Cloud. To this end, in this paper, we provide a focused study about how to perform integrity verification of the k-means clustering task outsourced to an IaaS provider. We consider the untrusted sloppy IaaS service provider that intends to return wrong clustering results by terminating the iterations early to save computational cost. We develop both probabilistic and deterministic verification methods to catch the incorrect clustering result by the service provider. The deterministic method returns 100% integrity guarantee with cost that is much cheaper than executing k-means clustering locally, while the probabilistic method returns a probabilistic integrity guarantee with computational cost even cheaper than the deterministic approach. Our experimental results show that our verification methods can effectively and efficiently capture the sloppy service provider.

[1]  Andrew J. Blumberg Toward Practical and Unconditional Verification of Remote Computations , 2011, HotOS.

[2]  Ninghui Li,et al.  On the (In)Security and (Im)Practicality of Outsourcing Precise Association Rule Mining , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[3]  Xiaofeng Meng,et al.  Integrity Auditing of Outsourced Data , 2007, VLDB.

[4]  Nikos Mamoulis,et al.  An Audit Environment for Outsourcing of Frequent Itemset Mining , 2009, Proc. VLDB Endow..

[5]  Sergei Vassilvitskii,et al.  How slow is the k-means method? , 2006, SCG '06.

[6]  Craig Gentry,et al.  Non-interactive Verifiable Computing: Outsourcing Computation to Untrusted Workers , 2010, CRYPTO.

[7]  Silvio Micali,et al.  The knowledge complexity of interactive proof-systems , 1985, STOC '85.

[8]  Kian-Lee Tan,et al.  Verifying completeness of relational query results in data publishing , 2005, SIGMOD '05.

[9]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[10]  Sariel Har-Peled,et al.  How Fast Is the k-Means Method? , 2005, SODA '05.

[11]  Andrea Vattani,et al.  k-means Requires Exponentially Many Iterations Even in the Plane , 2008, SCG '09.

[12]  Jagan Sankaranarayanan,et al.  Max-margin clustering: Detecting margins from projections of points on lines , 2011, CVPR 2011.

[13]  Meena Mahajan,et al.  The Planar k-means Problem is NP-hard I , 2009 .

[14]  Carsten Lund,et al.  Proof verification and hardness of approximation problems , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[15]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[16]  Philip S. Yu,et al.  k-Support anonymity based on pseudo taxonomy for outsourcing of frequent itemset mining , 2010, KDD.

[17]  Rajkumar Buyya,et al.  Market-Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.

[18]  Nikos Mamoulis,et al.  Security in Outsourcing of Association Rule Mining , 2007, VLDB.

[19]  Bernard Chazelle,et al.  An optimal convex hull algorithm in any fixed dimension , 1993, Discret. Comput. Geom..

[20]  Feifei Li,et al.  Dynamic authenticated index structures for outsourced databases , 2006, SIGMOD Conference.

[21]  Lakshmi Sobhana Kalli,et al.  Market-Oriented Cloud Computing : Vision , Hype , and Reality for Delivering IT Services as Computing , 2013 .

[22]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[23]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[24]  Hakan Hacigümüs,et al.  Executing SQL over encrypted data in the database-service-provider model , 2002, SIGMOD '02.