Efficient parallel set-similarity joins using MapReduce
暂无分享,去创建一个
Chen Li | Rares Vernica | Michael J. Carey | M. Carey | R. Vernica | Chen Li | Chen Li
[1] David J. DeWitt,et al. A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment , 1989, SIGMOD '89.
[2] Masaru Kitsuregawa,et al. Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC) , 1990, VLDB.
[3] David J. DeWitt,et al. An Evaluation of Non-Equijoin Algorithms , 1991, VLDB.
[4] David J. DeWitt,et al. Parallel database systems: the future of high performance database systems , 1992, CACM.
[5] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[6] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.
[7] Luis Gravano,et al. Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.
[8] Justin Zobel,et al. Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..
[9] Sunita Sarawagi,et al. Efficient set joins on similarity predicates , 2004, SIGMOD '04.
[10] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[11] Mehran Sahami,et al. Evaluating similarity measures: a large-scale study in the orkut social network , 2005, KDD '05.
[12] Raghav Kaushik,et al. Efficient exact set-similarity joins , 2006, VLDB.
[13] Surajit Chaudhuri,et al. A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[14] Mehran Sahami,et al. A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.
[15] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[16] Douglas Stott Parker,et al. Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.
[17] Roberto J. Bayardo,et al. Scaling up all pairs similarity search , 2007, WWW '07.
[18] Divyakant Agrawal,et al. Detectives: detecting coalition hit inflation attacks in advertising networks streams , 2007, WWW '07.
[19] Xuemin Lin,et al. Ed-Join: an efficient algorithm for similarity joins with edit distance constraints , 2008, Proc. VLDB Endow..
[20] Michael Stonebraker,et al. A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.
[21] Hidehiko Tanaka,et al. Application of hash to data base machine and its architecture , 1983, New Generation Computing.
[22] Christopher Olston,et al. Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience , 2009, Proc. VLDB Endow..
[23] Jeffrey Xu Yu,et al. Efficient similarity joins for near-duplicate detection , 2011, TODS.
[24] M. Carey,et al. Efficient Parallel Set-Similarity Joins Using MapReduce Rares , 2011 .