论文信息 - Locality Sensitive Hashing for Similarity Search Using MapReduce on Large Scale Data

Locality Sensitive Hashing for Similarity Search Using MapReduce on Large Scale Data

The paper describes a very popular approach to the problem of similarity search, namely methods based on Locality Sensitive Hashing (LSH). To make coping with large scale data possible, these techniques have been used on the distributed and parallel computing framework for efficient processing using MapReduce paradigm from its open source implementation Apache Hadoop.

Radoslaw Szmit

[1] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[2] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3] Srinivasan Parthasarathy,et al. Bayesian Locality Sensitive Hashing for Fast Similarity Search , 2011, Proc. VLDB Endow..

[4] Alan M. Frieze,et al. Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[5] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[6] Abhinandan Das,et al. Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[7] Jacek M. Zurada,et al. Artificial Intelligence and Soft Computing, 10th International Conference, ICAISC 2010, Zakopane, Poland, June 13-17, 2010, Part I , 2010, International Conference on Artificial Intelligence and Soft Computing.

[8] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[9] Aoying Zhou,et al. XML Structural Similarity Search Using MapReduce , 2010, WAIM.

[10] Alan M. Frieze,et al. Min-wise independent permutations (extended abstract) , 1998, STOC '98.

[11] Konstanty Haniewicz,et al. Fast Plagiarism Detection by Sentence Hashing , 2012, ICAISC.