HARRA: fast iterative hashed record linkage for large-scale data collections
暂无分享,去创建一个
[1] Craig A. Knoblock,et al. Learning Blocking Schemes for Record Linkage , 2006, AAAI.
[2] Roberto J. Bayardo,et al. Scaling up all pairs similarity search , 2007, WWW '07.
[3] Byung-Won On,et al. Comparative study of name disambiguation problem using a scalable blocking-based framework , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).
[4] Edith Cohen,et al. Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).
[5] S. Russell. Identity Uncertainty , 2010, Encyclopedia of Machine Learning.
[6] Lise Getoor,et al. Collective entity resolution in relational data , 2007, TKDD.
[7] Dongwon Lee,et al. Parallel linkage , 2007, CIKM '07.
[8] Jennifer Widom,et al. Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.
[9] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.
[10] Alexandr Andoni,et al. Efficient algorithms for substring near neighbor problem , 2006, SODA '06.
[11] Edith Cohen,et al. Finding Interesting Associations without Support Pruning , 2001, IEEE Trans. Knowl. Data Eng..
[12] Divesh Srivastava,et al. Group Linkage , 2007, 2007 IEEE 23rd International Conference on Data Engineering.
[13] Peter Christen,et al. Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface , 2008, KDD.
[14] Xin Li,et al. Constraint-Based Entity Matching , 2005, AAAI.
[15] Alexandr Andoni,et al. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).
[16] Stuart J. Russell,et al. Identity Uncertainty and Citation Matching , 2002, NIPS.
[17] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.
[18] Rajeev Motwani,et al. Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.
[19] Sunita Sarawagi,et al. Efficient set joins on similarity predicates , 2004, SIGMOD '04.
[20] Zhe Wang,et al. Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.
[21] Jayant Madhavan,et al. Reference reconciliation in complex information spaces , 2005, SIGMOD '05.
[22] Raymond J. Mooney,et al. Adaptive Blocking: Learning to Scale Up Record Linkage , 2006, Sixth International Conference on Data Mining (ICDM'06).
[23] Jeffrey Xu Yu,et al. Efficient similarity joins for near-duplicate detection , 2011, TODS.
[24] Hector Garcia-Molina,et al. Generic Entity Resolution with Data Confidences , 2006, CleanDB.
[25] Paul M. B. Vitányi,et al. The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.
[26] Pradeep Ravikumar,et al. A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.
[27] Chen Li,et al. Supporting Efficient Record Linkage for Large Data Sets Using Mapping Techniques , 2006, World Wide Web.
[28] Dmitri V. Kalashnikov,et al. Exploiting Relationships for Domain-Independent Data Cleaning , 2005, SDM.
[29] Raghav Kaushik,et al. Efficient exact set-similarity joins , 2006, VLDB.
[30] Panagiotis Papapetrou,et al. Nearest Neighbor Retrieval Using Distance-Based Hashing , 2008, 2008 IEEE 24th International Conference on Data Engineering.
[31] Anuradha Bhamidipaty,et al. Interactive deduplication using active learning , 2002, KDD.
[32] Luis Gravano,et al. Text joins in an RDBMS for web data integration , 2003, WWW '03.
[33] Surajit Chaudhuri,et al. A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[34] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.