Architecture for a Garbage-less and Fresh Content Search Engine
暂无分享,去创建一个
[1] Marc Najork,et al. Detecting phrase-level duplication on the world wide web , 2005, SIGIR '05.
[2] Sriram Raghavan,et al. Crawling the Hidden Web , 2001, VLDB.
[3] Marc Najork,et al. Detecting spam web pages through content analysis , 2006, WWW '06.
[4] Marc Najork,et al. Spam, damn spam, and statistics: using statistical analysis to locate spam web pages , 2004, WebDB '04.
[5] Fidel Cacheda,et al. Analysis and Detection of Web Spam by Means of Web Content , 2012, IRFC.
[6] B. Huberman,et al. The Deep Web : Surfacing Hidden Value , 2000 .
[7] J. Ross Quinlan,et al. Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.
[8] Brian D. Davison,et al. Cloaking and Redirection: A Preliminary Study , 2005, AIRWeb.
[9] Hector Garcia-Molina,et al. Estimating frequency of change , 2003, TOIT.
[10] Andrei Z. Broder,et al. Sic transit gloria telae: towards an understanding of the web's decay , 2004, WWW '04.
[11] Hector Garcia-Molina,et al. Web Spam Taxonomy , 2005, AIRWeb.
[12] Brian D. Davison,et al. Identifying link farm spam pages , 2005, WWW '05.
[13] George Cybenko,et al. How dynamic is the Web? , 2000, Comput. Networks.
[14] Kumar Chellapilla,et al. A taxonomy of JavaScript redirection spam , 2007, AIRWeb '07.