Analysis and detection of Soft-404 pages
暂无分享,去创建一个
[1] Ricardo A. Baeza-Yates,et al. Characterization of national Web domains , 2007, TOIT.
[2] Ricardo Baeza-Yates,et al. Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .
[3] Antonio Gulli,et al. The indexable web is more than 11.5 billion pages , 2005, WWW '05.
[4] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.
[5] Christophe Bisciglia,et al. Cluster computing for web-scale data processing , 2008, SIGCSE '08.
[6] William C. Schmidt,et al. World-Wide Web survey research: Benefits, potential problems, and solutions , 1997 .
[7] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[8] Chabane Djeraba,et al. High performance crawling system , 2004, MIR '04.
[9] Marc Najork,et al. Spam, damn spam, and statistics: using statistical analysis to locate spam web pages , 2004, WebDB '04.
[10] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.
[11] Brian D. Davison,et al. Adversarial Web Search , 2011, Found. Trends Inf. Retr..
[12] Donghua Pan,et al. Web Page Content Extraction Method Based on Link Density and Statistic , 2008, 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing.
[13] Andrei Z. Broder,et al. Sic transit gloria telae: towards an understanding of the web's decay , 2004, WWW '04.
[14] Idit Keidar,et al. Do not crawl in the DUST: different URLs with similar text , 2006, WWW.
[15] Sriram Raghavan,et al. Crawling the Hidden Web , 2001, VLDB.
[16] B. Huberman,et al. The Deep Web : Surfacing Hidden Value , 2000 .
[17] Frank M. Shipman,et al. Identifying "Soft 404" Error Pages: Analyzing the Lexical Signatures of Documents in Distributed Collections , 2012, TPDL.
[18] Hector Garcia-Molina,et al. Web Spam Taxonomy , 2005, AIRWeb.
[19] Kentaro Inui,et al. Development of a large-scale web crawler and search engine infrastructure , 2009, IUCS '09.
[20] Rajeev Motwani,et al. Stratified Planning , 2009, IJCAI.
[21] J. Ross Quinlan,et al. Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.
[22] Sung-Ryul Kim,et al. Detecting soft errors by redirection classification , 2009, WWW '09.
[23] Michael K. Bergman. White Paper: The Deep Web: Surfacing Hidden Value , 2001 .