MapDupReducer: detecting near duplicates over massive datasets
暂无分享,去创建一个
Jianmin Wang | Rui Li | Haixun Wang | Jun Xu | Xuemin Lin | Hongsong Li | Wei Wang | Chaokun Wang | Wanpeng Tian
[1] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[2] Joshua Alspector,et al. Improved robustness of signature-based near-replica detection via lexicon randomization , 2004, KDD.
[3] Jeffrey Xu Yu,et al. Efficient similarity joins for near-duplicate detection , 2011, TODS.
[4] Jimmy J. Lin. Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce , 2009, SIGIR.
[5] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.