Introducing the Portuguese web archive initiative
暂无分享,去创建一个
[1] Marc Najork,et al. On near-uniform URL sampling , 2000, Comput. Networks.
[2] Kristinn Sigurðsson. Managing duplicates across sequential crawls , 2010 .
[3] Torsten Suel,et al. Efficient search in large textual collections with redundancy , 2007, WWW '07.
[4] T. Drugeon. A technical approach for the french web legal deposit , 2005 .
[5] Michael Herscovici,et al. Efficient Indexing of Versioned Document Sequences , 2007, ECIR.
[6] Miguel Costa,et al. Optimizing Ranking Calculation in Web Search Engines: a Case Study , 2004, SBBD.
[7] Nathaniel S. Borenstein,et al. Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types , 1996, RFC.
[8] Allan Arvidson,et al. The Kulturarw Project — The Swedish Royal Web Archive , 1998 .
[9] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[10] Serge Abiteboul,et al. A First Experience in Archiving the French Web , 2002, ECDL.
[11] Yi-fang Brook Wu,et al. Domain-specific keyphrase extraction , 2005, CIKM '05.
[12] Marc Najork,et al. Detecting spam web pages through content analysis , 2006, WWW '06.
[13] Marc Moens,et al. Named Entity Recognition without Gazetteers , 1999, EACL.
[14] Hector Garcia-Molina,et al. Web Spam Taxonomy , 2005, AIRWeb.
[15] José Luis Borbinha,et al. A Deposit for Digital Collections , 2001, ECDL.
[16] Michael Stack. Full Text Search of Web Archive Collections , 2005 .
[17] Marc Najork,et al. On the evolution of clusters of near-duplicate Web pages , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).
[18] Jon Postel,et al. Domain Name System Structure and Delegation , 1994, RFC.
[19] Daniel Gomes,et al. The Viúva Negra crawler: an experience report , 2008, Softw. Pract. Exp..
[20] Ricardo A. Baeza-Yates,et al. Characterization of national Web domains , 2007, TOIT.
[21] Daniel Gomes,et al. Design and Selection Criteria for a National Web Archive , 2006, ECDL.
[22] Marc Najork,et al. A large‐scale study of the evolution of Web pages , 2003, WWW '03.
[23] Marc Najork,et al. Mercator: A scalable, extensible Web crawler , 1999, World Wide Web.
[24] David Wolinsky,et al. On the Design of Virtual Machine Sandboxes for Distributed Computing in Wide-area Overlays of Virtual Workstations , 2006, First International Workshop on Virtualization Technology in Distributed Computing (VTDC 2006).
[25] Otis Gospodnetic,et al. Lucene in Action (In Action series) , 2004 .
[26] Mário J. Silva,et al. Searching and Archiving the Web with Tumba , 2003 .
[27] New products , 1940, Electrical Engineering.
[28] Amanda Spink,et al. Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..
[29] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[30] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[31] Gerhard Weikum,et al. A Time Machine for Text Search , 2022 .
[32] Andrei Z. Broder,et al. Indexing Shared Content in Information Retrieval Systems , 2006, EDBT.
[33] Brad Tofel. ‘Wayback’ for Accessing Web Archives , 2007 .
[34] Daniel Gomes,et al. Web modelling for web warehouse design , 2007 .
[35] Ian H. Witten,et al. Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .