Managing duplicates in a web archive
暂无分享,去创建一个
Daniel Gomes | Mário J. Silva | Mário J. Silva | André L. Santos | André L. M. Santos | Daniel Gomes
[1] Torsten Suel,et al. Design and implementation of a high-performance distributed Web crawler , 2002, Proceedings 18th International Conference on Data Engineering.
[2] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[3] Mário J. Silva,et al. Searching and Archiving the Web with Tumba , 2003 .
[4] Marc Najork,et al. Mercator: A scalable, extensible Web crawler , 1999, World Wide Web.
[5] Daniel Gomes,et al. Characterizing a national community web , 2005, TOIT.
[6] Hector Garcia-Molina,et al. Archival storage for digital libraries , 1998, DL '98.
[7] Anna Patterson. Why Writing Your Own Search Engine Is Hard , 2004, ACM Queue.
[8] Brian Berliner,et al. CVS II: Parallelizing Software Dev elopment , 1998 .
[9] Hector Garcia-Molina,et al. Estimating frequency of change , 2003, TOIT.
[10] Ben Y. Zhao,et al. Awarded Best Student Paper! - Pond: The OceanStore Prototype , 2003 .
[11] Windsor W. Hsu,et al. Duplicate Management for Reference Data , 2004 .
[12] Michalis Vazirgiannis,et al. Archiving the Greek Web , 2004 .
[13] Josh Macdonald,et al. Versioned File Archiving, Compression, and Distribution , 1999 .
[14] Miguel Costa,et al. The XLDB Group at CLEF 2004 , 2004, CLEF.
[15] D. B. Davis,et al. Sun Microsystems Inc. , 1993 .
[16] Hector Garcia-Molina,et al. SCAM: A Copy Detection Mechanism for Digital Documents , 1995, DL.
[17] Andrei Z. Broder,et al. Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content , 1999, Comput. Networks.
[18] Ben Y. Zhao,et al. Pond: The OceanStore Prototype , 2003, FAST.
[19] Terence Kelly,et al. Aliasing on the world wide web: prevalence and performance implications , 2002, WWW '02.
[20] Daniel Gomes,et al. Versus: A Web Repository , 2002 .
[21] Timo Burkard,et al. Herodotus: A Peer-to-Peer Web Archival System , 2002 .
[22] Jeffrey C. Mogul,et al. A trace-based analysis of duplicate suppression in HTTP , 2000 .
[23] Christos T. Karamanolis,et al. Evaluation of Efficient Archival Storage Techniques , 2004, MSST.
[24] José Luis Borbinha,et al. A Deposit for Digital Collections , 2001, ECDL.
[25] Ethan L. Miller,et al. A fast algorithm for online placement and reorganization of replicated data , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[26] M. O. Rabin. PROBABILISTIC ALGORITHM IN FINITE FIELDS , 1979 .
[27] Hector Garcia-Molina,et al. Copy detection mechanisms for digital documents , 1995, SIGMOD '95.
[28] Daniel Gomes,et al. Webstore: A Manager for Incremental Storage of Contents , 2004 .
[29] Juha Hakala,et al. The NEDLIB harvester , 2001 .
[30] Sriram Raghavan,et al. Stanford WebBase components and applications , 2006, TOIT.
[31] Arkady B. Zaslavsky,et al. Signature Extraction for Overlap Detection in Documents , 2002, ACSC.
[32] Ohad Rodeh,et al. zFS - a scalable distributed file system using object disks , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..
[33] Diomidis Spinellis,et al. The decay and failures of web references , 2003, CACM.
[34] Marc Najork,et al. A large‐scale study of the evolution of Web pages , 2004, Softw. Pract. Exp..
[35] Hector Garcia-Molina,et al. Finding Near-Replicas of Documents and Servers on the Web , 1998, WebDB.
[36] Renato Iannella,et al. Uniform Resource Names (URN) Namespace Definition Mechanisms , 2002, RFC.
[37] Chabane Djeraba. Dominos: A New Web Crawler's Design , 2004 .