A PG-LSH Similarity Search Method for Cloud Storage

LSH (Locality Sensitive Hashing) indexing is often used for similarity search in high-dimensional object set. The way of separating index and store in LSH is particularly suitable for searching encrypted data stored in the cloud. However, most LSH methods only use or improve the first phase of LSH for similarity search. In this paper, we focus on the second phase of LSH, and propose a probability guarantee LSH (PG-LSH) method based on the Multi-Probe method, and apply it to cloud storage encrypted data search. Our approach simplifies the candidate set generation process and removes pseudo-similar objects. Experimental results show that the elimination rate of PG-LSH is more than 21%. It can also effectively reduce the time and space without losing search quality.

[1]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[2]  Cong Wang,et al.  Secure Ranked Keyword Search over Encrypted Cloud Data , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[3]  Anirban Dasgupta,et al.  Fast locality-sensitive hashing , 2011, KDD.

[4]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[5]  Alexandr Andoni,et al.  Nearest neighbor search : the old, the new, and the impossible , 2009 .

[6]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[7]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[8]  Ping Li,et al.  b-Bit minwise hashing , 2009, WWW '10.

[9]  Kyuseok Shim,et al.  Similarity Join Size Estimation using Locality Sensitive Hashing , 2011, Proc. VLDB Endow..

[10]  Srinivasan Parthasarathy,et al.  Bayesian Locality Sensitive Hashing for Fast Similarity Search , 2011, Proc. VLDB Endow..

[11]  Michael A. Casey,et al.  Locality-Sensitive Hashing for Finding Nearest Neighbors , 2008 .

[12]  Ravi Kumar,et al.  LSH-Preserving Functions and Their Applications , 2012, SODA.

[13]  Murat Kantarcioglu,et al.  Efficient Similarity Search over Encrypted Data , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[14]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[15]  Daisy Zhe Wang,et al.  Querying probabilistic information extraction , 2010, Proc. VLDB Endow..

[16]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[17]  Jeremy Buhler,et al.  Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..

[18]  Jie Wu,et al.  Secure and privacy preserving keyword searching for cloud storage services , 2012, J. Netw. Comput. Appl..

[19]  Ashish Goel,et al.  Similarity search and locality sensitive hashing using ternary content addressable memories , 2010, SIGMOD Conference.

[20]  Panos Kalnis,et al.  Outsourced Similarity Search on Metric Data Assets , 2012, IEEE Transactions on Knowledge and Data Engineering.

[21]  Dong Hoon Lee,et al.  Trapdoor security in a searchable public-key encryption scheme with a designated tester , 2010, J. Syst. Softw..

[22]  M. Slaney,et al.  Locality-Sensitive Hashing for Finding Nearest Neighbors [Lecture Notes] , 2008, IEEE Signal Processing Magazine.

[23]  Richard E. Ladner,et al.  Fast nearest neighbor search of entropy-constrained vector quantization , 2000, IEEE Trans. Image Process..

[24]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near-duplicate detection , 2011, TODS.

[25]  Anthony K. H. Tung,et al.  Efficient and effective similarity search over probabilistic data based on Earth Mover’s Distance , 2010, The VLDB Journal.

[26]  Zhe Wang,et al.  Modeling LSH for performance tuning , 2008, CIKM '08.