Effective web crawling
暂无分享,去创建一个
[1] Alistair Moffat,et al. Performance and Cost Tradeoffs in Web Search , 2004, ADC.
[2] Daniel Gomes,et al. A Characterization of the Portuguese Web , 2003 .
[3] John A. Tomlin,et al. A new paradigm for ranking pages on the world wide web , 2003, WWW '03.
[4] Ricardo A. Baeza-Yates,et al. Web Structure, Dynamics and Page Quality , 2002, SPIRE.
[5] Kurt Rothermel,et al. Maintaining Specialized Search Engines through Mobile Filter Agents , 1999, CIA.
[6] JaimesA.,et al. On the image content of a web segment , 2004 .
[7] Hector Garcia-Molina,et al. Performance of Inverted Indices in Distributed Text Document Retrieval Systems , 1993 .
[8] Terrence A. Brooks,et al. Web search: how the Web has changed information retrieval , 2003, Information Research.
[9] Ricardo A. Baeza-Yates,et al. Crawling the Infinite Web: Five Levels Are Enough , 2004, WAW.
[10] Roy H. Campbell,et al. Internet search engine freshness by Web server help , 2001, Proceedings 2001 Symposium on Applications and the Internet.
[11] Martin van den Berg,et al. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.
[12] Lada A. Adamic,et al. Evolutionary Dynamics of the World Wide Web , 1999 .
[13] Sriram Raghavan,et al. Searching the Web , 2001, ACM Trans. Internet Techn..
[14] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[15] Albert,et al. Emergence of scaling in random networks , 1999, Science.
[16] Anja Feldmann,et al. Potential benefits of delta encoding and data compression for HTTP , 1997, SIGCOMM '97.
[17] Marco Gori,et al. A unified probabilistic framework for Web page scoring systems , 2004, IEEE Transactions on Knowledge and Data Engineering.
[18] Saul Greenberg,et al. Revisitation patterns in World Wide Web navigation , 1997, CHI.
[19] Torsten Suel,et al. Design and implementation of a high-performance distributed Web crawler , 2002, Proceedings 18th International Conference on Data Engineering.
[20] Brigitte Trousse,et al. Advanced data preprocessing for intersites Web usage mining , 2004, IEEE Intelligent Systems.
[21] Berthier A. Ribeiro-Neto,et al. CoBWeb-a crawler for the Brazilian Web , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).
[22] Binzhang Liu. Characterizing Web Response Time , 1998 .
[23] Marc Najork,et al. Mercator: A scalable, extensible Web crawler , 1999, World Wide Web.
[24] Ricardo A. Baeza-Yates,et al. Web page ranking using link attributes , 2004, WWW Alt. '04.
[25] M. Kendall. Rank Correlation Methods , 1949 .
[26] Ricardo A. Baeza-Yates,et al. Web Dynamics, Structure, and Page Quality , 2004, Web Dynamics.
[27] Marc Najork,et al. On near-uniform URL sampling , 2000, Comput. Networks.
[28] Huberman,et al. Strong regularities in world wide web surfing , 1998, Science.
[29] Brian D. Davison. Topical locality in the Web , 2000, SIGIR '00.
[30] Serge Abiteboul,et al. Adaptive on-line page importance computation , 2003, WWW '03.
[31] Sebastiano Vigna,et al. The webgraph framework I: compression techniques , 2004, WWW '04.
[32] Hector Garcia-Molina,et al. Efficient Crawling Through URL Ordering , 1998, Comput. Networks.
[33] Zhen Liu,et al. Optimal Robot Scheduling for Web Search Engines , 1998 .
[34] Ricardo A. Baeza-Yates,et al. Content-Based Image Retrieval and Characterization on Specific Web Collections , 2004, CIVR.
[35] Jon Kleinberg,et al. Authoritative sources in a hyperlinked environment , 1999, SODA '98.
[36] Hector Garcia-Molina,et al. The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.
[37] Kevin S. McCurley,et al. Ranking the web frontier , 2004, WWW '04.
[38] Hector Garcia-Molina,et al. Parallel crawlers , 2002, WWW.
[39] Vipin Kumar,et al. Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.
[40] Jeffrey Scott Vitter,et al. Characterizing Web Document Change , 2001, WAIM.
[41] Ricardo A. Baeza-Yates. Challenges in the Interaction of Information Retrieval and Natural Language Processing , 2004, CICLing.
[42] Iadh Ounis,et al. A utility-oriented hyperlink analysis model for the Web , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).
[43] Sriram Raghavan,et al. Crawling the Hidden Web , 2001, VLDB.
[44] Anja Feldmann,et al. Rate of Change and other Metrics: a Live Study of the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.
[45] Giles,et al. Searching the world wide Web , 1998, Science.
[46] Danny B. Lange,et al. Seven good reasons for mobile agents , 1999, CACM.
[47] Marina Buzzi,et al. Cooperative crawling , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).
[48] B. Huberman,et al. Surfing as a real option , 1998, ICE '98.
[49] Ricardo A. Baeza-Yates,et al. Scheduling algorithms for Web crawling , 2004, WebMedia and LA-Web, 2004. Proceedings.
[50] Jenny Edwards,et al. An adaptive model for optimizing performance of an incremental web crawler , 2001, WWW '01.
[51] Hector Garcia-Molina,et al. Estimating frequency of change , 2003, TOIT.
[52] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[53] Andrei Z. Broder,et al. A Comparison of Techniques to Find Mirrored Hosts on the WWW , 2000, IEEE Data Eng. Bull..
[54] Torsten Suel,et al. Server-Friendly Delta Compression for Efficient Web Access , 2003, WCW.
[55] Ricardo A. Baeza-Yates,et al. Evolution of the Chilean Web structure composition , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).
[56] M. Koster,et al. Robots in the Web : threat or treat ? , 1995, WWW Spring 1995.
[57] Hector Garcia-Molina,et al. Effective page refresh policies for Web crawlers , 2003, TODS.
[58] Diomidis Spinellis,et al. The decay and failures of web references , 2003, CACM.
[59] Sebastiano Vigna,et al. UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..
[60] Ravi Kumar,et al. Self-similarity in the web , 2001, TOIT.
[61] Hector Garcia-Molina,et al. Synchronizing a database to improve freshness , 2000, SIGMOD 2000.
[62] J. M. Bevan,et al. Rank Correlation Methods , 1949 .
[63] Carlos Castillo. Cooperation schemes between a Web server and a Web search engine , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).
[64] Krishna Bharat,et al. SPHINX: A Framework for Creating Personal, Site-Specific Web Crawlers , 1998, Comput. Networks.
[65] Knut Magne Risvik,et al. Search engines and Web dynamics , 2002, Comput. Networks.
[66] Luis Gravano,et al. STARTS: Stanford Proposal for Internet Meta-Searching (Experience Paper) , 1997, SIGMOD Conference.
[67] Marc Najork,et al. Breadth-first crawling yields high-quality pages , 2001, WWW '01.
[68] Jiming Liu,et al. Characterizing Web usage regularities with information foraging agents , 2004, IEEE Transactions on Knowledge and Data Engineering.
[69] Jerome Talim,et al. Controlling the robots of Web search engines , 2001, SIGMETRICS '01.
[70] Marc Najork,et al. A large‐scale study of the evolution of Web pages , 2004, Softw. Pract. Exp..
[71] David Eichmann,et al. The RBSE spider — Balancing effective search against Web load , 1994, WWW Spring 1994.
[72] Ricardo A. Baeza-Yates,et al. On the image content of the Chilean Web , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).
[73] David W. Brooks,et al. “Link rot” limits the usefulness of web‐based educational materials in biochemistry and molecular biology * , 2003 .
[74] Junghoo Cho,et al. Page quality: in search of an unbiased web ranking , 2005, SIGMOD '05.
[75] George Cybenko,et al. How dynamic is the Web? , 2000, Comput. Networks.
[76] Ricardo A. Baeza-Yates,et al. On the Image Content of a Web Segment: Chile as a Case Study , 2004, J. Web Eng..
[77] Jon M. Kleinberg,et al. Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.
[78] Marios D. Dikaiakos,et al. Design and Implementation of a Distributed Crawler and Filtering Processor , 2002, NGITS.
[79] Gerard Salton,et al. The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .
[80] Albert-László Barabási,et al. The physics of the Web , 2001 .
[81] Ricardo A. Baeza-Yates,et al. Relating Web Characteristics with Link Based Web Page Ranking , 2001, SPIRE.
[82] Torsten Suel,et al. Compressing the graph structure of the Web , 2001, Proceedings DCC 2001. Data Compression Conference.
[83] Oliver A. McBryan,et al. GENVL and WWWW: Tools for taming the Web , 1994, WWW Spring 1994.
[84] James E. Pitkow,et al. Characterizing Browsing Behaviors on the World-Wide Web , 1995 .
[85] Edward A. Fox,et al. Web Traffic Latency: Characteristics and Implications , 1998, J. Univers. Comput. Sci..
[86] Sebastiano Vigna,et al. Do Your Worst to Make the Best: Paradoxical Effects in PageRank Incremental Computations , 2004, WAW.
[87] Taher H. Haveliwala. Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..
[88] Ricardo A. Baeza-Yates,et al. Balancing Volume, Quality and Freshness in Web Crawling , 2002, HIS.
[89] Ricardo A. Baeza-Yates,et al. Dynamics of the Chilean Web Structure , 2004, WebDyn@WWW.
[90] Eli Upfal,et al. Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.
[91] Dik Lun Lee,et al. Search and ranking algorithms for locating resources on the World Wide Web , 1996, Proceedings of the Twelfth International Conference on Data Engineering.
[92] Wallace Koehler,et al. A longitudinal study of Web pages continued: a consideration of document persistence , 2003, Inf. Res..
[93] Béla Bollobás,et al. Random Graphs , 1985 .
[94] Luis Gravano,et al. STARTS: Stanford proposal for Internet meta-searching , 1997, SIGMOD '97.
[95] C. Lee Giles,et al. Accessibility of information on the Web , 2000, INTL.
[96] Susan Haigh,et al. Measuring Web Site Usage: Log File Analysis , 1998 .
[97] Virgílio A. F. Almeida,et al. In search of invariants for e-business workloads , 2000, EC '00.
[98] Hector Garcia-Molina,et al. Finding replicated Web collections , 2000, SIGMOD 2000.
[99] Christopher Olston,et al. What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.
[100] Anna Patterson. Why Writing Your Own Search Engine Is Hard , 2004, ACM Queue.
[101] Andreas Rauber,et al. Uncovering Information Hidden in Web Archives: A Glimpse at Web Analysis Building on Data Warehouses , 2002, D Lib Mag..
[102] Myra Spiliopoulou,et al. Analysis of navigation behaviour in web sites integrating multiple information systems , 2000, The VLDB Journal.
[103] Yanhong Li. Toward A Qualitative Search Engine , 1998, IEEE Internet Comput..
[104] Andrei Z. Broder,et al. Sic transit gloria telae: towards an understanding of the web's decay , 2004, WWW '04.
[105] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..
[106] Monika Henzinger,et al. Hyperlink Analysis for the Web , 2001, IEEE Internet Comput..
[107] Rick Kazman,et al. WebQuery: Searching and Visualizing the Web Through Connectivity , 1997, Comput. Networks.
[108] Martin Bergman,et al. The deep web:surfacing the hidden value , 2000 .
[109] Adam Kilgarriff,et al. Introduction to the Special Issue on the Web as Corpus , 2003, CL.
[110] Walid G. Aref,et al. Databases deepen the Web , 2004, Computer.
[111] Franco Scarselli,et al. Design of a crawler with bounded bandwidth , 2004, WWW Alt. '04.
[112] B. Pinkerton,et al. Finding What People Want : Experiences with the WebCrawler , 1994, WWW Spring 1994.
[113] Marco Gori,et al. Focused Crawling Using Context Graphs , 2000, VLDB.
[114] Mark Levene,et al. Zipf's Law for Web Surfers , 2001, Knowledge and Information Systems.
[115] Margo I. Seltzer,et al. World Wide Web Cache Consistency , 1996, USENIX Annual Technical Conference.
[116] Marc Najork,et al. Measuring Index Quality Using Random Walks on the Web , 1999, Comput. Networks.
[117] David M. Pennock,et al. Winners don't take all: Characterizing the competition for links on the web , 2002, Proceedings of the National Academy of Sciences of the United States of America.