NEST: Locality-aware approximate query service for cloud computing

Cloud computing applications face the challenges of dealing with a huge volume of data that needs the support of fast approximate queries to enhance system scalability and improve quality of service, especially when users are not aware of exact query inputs. Locality-Sensitive Hashing (LSH) can support the approximate queries that unfortunately suffer from imbalanced load and space inefficiency among distributed data servers, which severely limits the query accuracy and incurs long query latency between users and cloud servers. In this paper, we propose a novel scheme, called NEST, which offers ease-of-use and cost-effective approximate query service for cloud computing. The novelty of NEST is to leverage cuckoo-driven locality-sensitive hashing to find similar items that are further placed closely to obtain load-balancing buckets in hash tables. NEST hence carries out flat and manageable addressing in adjacent buckets, and obtains constant-scale query complexity even in the worst case. The benefits of NEST include the increments of space utilization and fast query response. Theoretical analysis and extensive experiments in a large-scale cloud testbed demonstrate the salient properties of NEST to meet the needs of approximate query service in cloud computing environments.

[1]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[2]  Cong Wang,et al.  Efficient verifiable fuzzy keyword search over encrypted data in cloud computing , 2013, Comput. Sci. Inf. Syst..

[3]  Jacob R. Lorch,et al.  A five-year study of file-system metadata , 2007, TOS.

[4]  Rasmus Pagh,et al.  On the cell probe complexity of membership and perfect hashing , 2001, STOC '01.

[5]  Trevor Darrell,et al.  Locality-Sensitive Hashing Using Stable Distributions , 2006 .

[6]  Cong Wang,et al.  Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data , 2014 .

[7]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[8]  Jimmy J. Lin,et al.  Automatic management of partitioned, replicated search services , 2011, SoCC.

[9]  Jie Wu,et al.  Efficient information retrieval for ranked queries in cost-effective cloud environments , 2012, 2012 Proceedings IEEE INFOCOM.

[10]  James R. Larus,et al.  Orleans: cloud computing for everyone , 2011, SoCC.

[11]  Hong Jiang,et al.  SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[12]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[13]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[14]  Marko Vukolic,et al.  Minimizing retrieval latency for content cloud , 2011, 2011 Proceedings IEEE INFOCOM.

[15]  Panos Kalnis,et al.  Quality and efficiency in high dimensional nearest neighbor search , 2009, SIGMOD Conference.

[16]  Yu Hua,et al.  BR-Tree: A Scalable Prototype for Supporting Multiple Queries of Multidimensional Data , 2009, IEEE Transactions on Computers.

[17]  Dan Feng,et al.  Locality-Sensitive Bloom Filter for Approximate Membership Query , 2012, IEEE Transactions on Computers.

[18]  Hiroshi Yoshida,et al.  Storage Networking Industry Association , 2009, Encyclopedia of Database Systems.

[19]  Beng Chin Ooi,et al.  Query optimization for massively parallel data processing , 2011, SoCC.

[20]  Bo Yu,et al.  Bounded LSH for Similarity Search in Peer-to-Peer File Systems , 2008, 2008 37th International Conference on Parallel Processing.

[21]  Cong Wang,et al.  Achieving usable and privacy-assured similarity search over outsourced cloud data , 2012, 2012 Proceedings IEEE INFOCOM.

[22]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.