ED-JOIN: AN EFFICIENT ALGORITHM FOR SIMILARITY JOINS WITH EDIT DISTANCE CONSTRAINTS
暂无分享,去创建一个
[1] Eugene W. Myers,et al. A fast bit-vector algorithm for approximate string matching based on dynamic programming , 1998, JACM.
[2] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[3] Surajit Chaudhuri,et al. Data Debugger: An Operator-Centric Approach for Data Quality Solutions , 2006, IEEE Data Eng. Bull..
[4] Michael J. Fischer,et al. The String-to-String Correction Problem , 1974, JACM.
[5] Sunita Sarawagi,et al. Efficient set joins on similarity predicates , 2004, SIGMOD '04.
[6] Jeffrey Xu Yu,et al. Efficient similarity joins for near-duplicate detection , 2011, TODS.
[7] Sven Helmer,et al. Evaluation of Main Memory Join Algorithms for Joins with Set Comparison Join Predicates , 1996, VLDB.
[8] Salvatore J. Stolfo,et al. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.
[9] Anuradha Bhamidipaty,et al. Interactive deduplication using active learning , 2002, KDD.
[10] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[11] William E. Winkler,et al. The State of Record Linkage and Current Research Problems , 1999 .
[12] Surajit Chaudhuri,et al. Example-driven design of efficient record matching queries , 2007, VLDB.
[13] Christian Böhm,et al. Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data , 2001, SIGMOD '01.
[14] Luis Gravano,et al. Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.
[15] Mike Paterson,et al. A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..
[16] Jeffrey F. Naughton,et al. Set Containment Joins: The Good, The Bad and The Ugly , 2000, VLDB.
[17] Pradeep Ravikumar,et al. Adaptive Name Matching in Information Integration , 2003, IEEE Intell. Syst..
[18] Kyuseok Shim,et al. Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance , 2007, VLDB.
[19] Justin Zobel,et al. Performance in Practice of String Hashing Functions , 1997, DASFAA.
[20] Xuemin Lin,et al. Ed-Join: an efficient algorithm for similarity joins with edit distance constraints , 2008, Proc. VLDB Endow..
[21] Gonzalo Navarro,et al. A guided tour to approximate string matching , 2001, CSUR.
[22] Raghav Kaushik,et al. Efficient exact set-similarity joins , 2006, VLDB.
[23] JUSTIN ZOBEL,et al. Inverted files for text search engines , 2006, CSUR.
[24] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .
[25] Nikos Mamoulis,et al. Efficient processing of joins on set-valued attributes , 2003, SIGMOD '03.
[26] Patrick A. V. Hall,et al. Approximate String Matching , 1994, Encyclopedia of Algorithms.
[27] Bin Wang,et al. VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams , 2007, VLDB.
[28] Roberto J. Bayardo,et al. Scaling up all pairs similarity search , 2007, WWW '07.
[29] Hector Garcia-Molina,et al. Adaptive algorithms for set containment joins , 2003, TODS.
[30] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .
[31] Divesh Srivastava,et al. Benchmarking declarative approximate selection predicates , 2007, SIGMOD '07.
[32] Alexandr Andoni,et al. Lower bounds for embedding edit distance into normed spaces , 2003, SODA '03.
[33] Surajit Chaudhuri,et al. A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[34] Jeffrey Xu Yu,et al. Efficient similarity joins for near duplicate detection , 2008, WWW.
[35] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.
[36] Juha Kärkkäinen,et al. One-Gapped q-Gram Filtersfor Levenshtein Distance , 2002, CPM.
[37] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.