Efficient Discovery of Authoritative Resources

Given a dynamic corpus whose content and attention are changing on a daily basis, is it possible to collect and maintain the high-quality resources with a minimal investment? We address two problems that arise from this question for hyperlinked corpora such as Web pages or blogs: how to efficiently discover the correct set of authoritative resources given a fixed network, and how to track these resources over time as new entrants arrive, old standbys depart, and existing participants change roles.

[1]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[2]  Zhen Liu,et al.  Optimal Robot Scheduling for Web Search Engines , 1998 .

[3]  Hector Garcia-Molina,et al.  The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.

[4]  Sebastiano Vigna,et al.  Do Your Worst to Make the Best: Paradoxical Effects in PageRank Incremental Computations , 2004, WAW.

[5]  Serge Abiteboul,et al.  Adaptive on-line page importance computation , 2003, WWW '03.

[6]  George Cybenko,et al.  How dynamic is the Web? , 2000, Comput. Networks.

[7]  Philip S. Yu,et al.  Optimal crawling strategies for web search engines , 2002, WWW '02.

[8]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[9]  Hector Garcia-Molina,et al.  Synchronizing a database to improve freshness , 2000, SIGMOD '00.

[10]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[11]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[12]  B. Pinkerton,et al.  Finding What People Want : Experiences with the WebCrawler , 1994, WWW Spring 1994.

[13]  Sandeep Pandey,et al.  User-centric Web crawling , 2005, WWW '05.

[14]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2004, Softw. Pract. Exp..

[15]  Cameron A. Marlow The Structural Determinants of Media Contagion , 2005 .

[16]  Anirban Dasgupta,et al.  The discoverability of the web , 2007, WWW '07.

[17]  Ravi Kumar,et al.  On the Bursty Evolution of Blogspace , 2003, WWW '03.