Constructing a text corpus for inexact duplicate detection
暂无分享,去创建一个
[1] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[2] Hector Garcia-Molina,et al. Finding near-replicas of documents on the Web , 1999 .
[3] Jean Carletta,et al. Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.
[4] Jack G. Conrad,et al. Online duplicate document detection: signature reliability in a dynamic retrieval environment , 2003, CIKM '03.
[5] Howard R. Turtle. Natural language vs. Boolean query evaluation: a comparison of retrieval performance , 1994, SIGIR '94.
[6] Hector Garcia-Molina,et al. Finding Near-Replicas of Documents and Servers on the Web , 1998, WebDB.
[7] Ophir Frieder,et al. Collection statistics for fast duplicate document detection , 2002, TOIS.