Distributed and collaborative Web Change Detection system
暂无分享,去创建一个
Victor Carneiro | Fidel Cacheda | Manuel Álvarez | Víctor M. Prieto | Fidel Cacheda | V. Carneiro | M. Álvarez
[1] George Cybenko,et al. How dynamic is the Web? , 2000, Comput. Networks.
[2] M. Tamer Özsu,et al. A Poisson Model for User Accesses to Web Pages , 2003, ISCIS.
[3] C. Lee Giles,et al. A large-scale study of robots.txt , 2007, WWW '07.
[4] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[5] Sriram Raghavan,et al. Searching the Web , 2001, ACM Trans. Internet Techn..
[6] Hassan Abolhassani,et al. Freshness of Web search engines: Improving performance of Web search engines using data mining techniques , 2009, 2009 International Conference for Internet Technology and Secured Transactions, (ICITST).
[7] Ricardo Baeza Yates,et al. Characteristics of the Web of Spain , 2005 .
[8] Marc Najork,et al. A large‐scale study of the evolution of Web pages , 2004, Softw. Pract. Exp..
[9] Martin Halvey,et al. WWW '07: Proceedings of the 16th international conference on World Wide Web , 2007, WWW 2007.
[10] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[11] Ricardo A. Baeza-Yates,et al. Web Structure, Dynamics and Page Quality , 2002, SPIRE.
[12] Donghua Pan,et al. Web Page Content Extraction Method Based on Link Density and Statistic , 2008, 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing.
[13] B. Huberman,et al. The Deep Web : Surfacing Hidden Value , 2000 .
[14] Hector Garcia-Molina,et al. Estimating frequency of change , 2003, TOIT.
[15] Ricardo A. Baeza-Yates,et al. Characterization of national Web domains , 2007, TOIT.
[16] Martin Höst,et al. Web server traffic in crisis conditions , 2005 .
[17] Hector Garcia-Molina,et al. The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.
[18] Antonio Gulli,et al. The indexable web is more than 11.5 billion pages , 2005, WWW '05.
[19] Sandeep Pandey,et al. Recrawl scheduling based on information longevity , 2008, WWW.
[20] Hector Garcia-Molina,et al. Web Spam Taxonomy , 2005, AIRWeb.
[21] Adam Rifkin,et al. Nutch: A Flexible and Scalable Open-Source Web Search Engine , 2005 .
[22] Chun Zhang,et al. Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.
[23] Marc Najork,et al. Spam, damn spam, and statistics: using statistical analysis to locate spam web pages , 2004, WebDB '04.
[24] Dirk Lewandowski,et al. A three-year study on the freshness of web search engine databases , 2008, J. Inf. Sci..
[25] Hector Garcia-Molina,et al. Synchronizing a database to improve freshness , 2000, SIGMOD '00.
[26] Anthony T. Holdener. Ajax: the definitive guide , 2008 .
[27] Sriram Raghavan,et al. Crawling the Hidden Web , 2001, VLDB.