Efficient web harvesting strategies for monitoring deep web content
暂无分享,去创建一个
Djoerd Hiemstra | Maurice van Keulen | Mohammadreza Khelghati | D. Hiemstra | M. V. Keulen | Mohammadreza Khelghati
[1] Djoerd Hiemstra,et al. Size estimation of non-cooperative data collections , 2012, IIWAS '12.
[2] Toshihide Ibaraki,et al. Resource allocation problems - algorithmic approaches , 1988, MIT Press series in the foundations of computing.
[3] George Cybenko,et al. How dynamic is the Web? , 2000, Comput. Networks.
[4] Claudio Carpineto,et al. A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.
[5] Hector Garcia-Molina,et al. Estimating frequency of change , 2003, TOIT.
[6] David R. Karger,et al. Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.
[7] Djoerd Hiemstra,et al. Deep web entity monitoring , 2013, WWW '13 Companion.
[8] Jeffrey Scott Vitter,et al. Characterizing Web Document Change , 2001, WAIM.
[9] Djoerd Hiemstra,et al. Towards complete coverage in focused web harvesting , 2015, iiWAS.
[10] Philip S. Yu,et al. Optimal crawling strategies for web search engines , 2002, WWW '02.
[11] Djoerd Hiemstra,et al. FedWeb Greatest Hits: Presenting the New Test Collection for Federated Web Search , 2015, WWW.
[12] Sang-goo Lee,et al. Proceedings of the 3rd international workshop on Data enginering issues in E-commerce and services: In conjunction with ACM Conference on Electronic Commerce (EC '07) , 2007 .
[13] Swati Mali. Focused Web Crawler with Page Change Detection Policy , 2011 .
[14] Zhen Liu,et al. Optimal Robot Scheduling for Web Search Engines , 1998 .
[15] Michael J. Cafarella. Extracting and Querying a Comprehensive Web Database , 2009, CIDR.
[16] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[17] Djoerd Hiemstra,et al. Overview of the TREC 2014 Federated Web Search Track , 2013, TREC.
[18] Hector Garcia-Molina,et al. Synchronizing a database to improve freshness , 2000, SIGMOD '00.
[19] Naresh Kumar,et al. A Survey on Reduction of Load on the Network , 2014, ISI.
[20] Anja Feldmann,et al. Rate of Change and other Metrics: a Live Study of the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.
[21] Djoerd Hiemstra,et al. Harvesting All Matching Information To A Given Query From a Deep Website , 2015, KDWeb.
[22] Craig E. Wills,et al. Towards a Better Understanding of Web Resources and Server Responses for Improved Caching , 1999, Comput. Networks.
[23] George Cybenko,et al. Keeping up with the changing Web , 2000, Computer.
[24] V. Kamakshi Prasad,et al. WEB CONTENT MINING TOOLS: A COMPARATIVE STUDY , 2011 .
[25] Hector Garcia-Molina,et al. The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.
[26] Andrei Z. Broder,et al. A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.
[27] Carlos Castillo,et al. Effective web crawling , 2005, SIGF.
[28] Michael K. Bergman. White Paper: The Deep Web: Surfacing Hidden Value , 2001 .
[29] C. Lee Giles,et al. Accessibility of information on the web , 1999, Nature.
[30] Y. Syed Mudhasir. Near-Duplicates Detection and Elimination Based on Web Provenance for Effective Web Search , 2012 .
[31] Zvi Galil,et al. A Fast Selection Algorithm and the Problem of Optimum Distribution of Effort , 1979, JACM.
[32] Victor Carneiro,et al. DeepBot: a focused crawler for accessing hidden web content , 2007, DEECS '07.
[33] Jayant Madhavan,et al. Google's Deep Web crawl , 2008, Proc. VLDB Endow..
[34] Kevin Chen-Chuan Chang,et al. Editorial: special issue on web content mining , 2004, SKDD.
[35] Ricardo A. Baeza-Yates,et al. Web Dynamics, Structure, and Page Quality , 2004, Web Dynamics.
[36] Serge Abiteboul. Issues in Monitoring Web Data , 2002, DEXA.
[37] Robert Boncella,et al. Competitive Intelligence and the Web , 2003, Commun. Assoc. Inf. Syst..
[38] Yeye He,et al. Crawling deep web entity pages , 2013, WSDM.