论文信息 - Of Ivory and Smurfs: Loxodontan MapReduce Experiments for Web Search - 字舞流文

Of Ivory and Smurfs: Loxodontan MapReduce Experiments for Web Search

This paper describes Ivory, an attempt to build a distributed retrieval system around the open-source Hadoop implementation of MapReduce. We focus on three noteworthy aspects of our work: a retrieval architecture built directly on the Hadoop Distributed File System (HDFS), a scalable MapReduce algorithm for inverted indexing, and webpage classification to enhance retrieval effectiveness.

Jimmy J. Lin | Tamer Elsayed | Lidan Wang | Donald Metzler

[1] James Allan,et al. Minimal test collections for retrieval evaluation , 2006, SIGIR.

[2] Brian D. Davison,et al. Web page classification: Features and algorithms , 2009, CSUR.

[3] Craig MacDonald,et al. Comparing Distributed Indexing: To MapReduce or Not? , 2009, LSDS-IR@SIGIR.

[4] James Allan,et al. Topic detection and tracking: event-based information organization , 2002 .

[5] GhemawatSanjay,et al. The Google file system , 2003 .

[6] Ian H. Witten,et al. Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[7] Christophe Bisciglia,et al. Cluster computing for web-scale data processing , 2008, SIGCSE '08.

[8] Tim Leek,et al. Probabilistic approaches to topic detection and tracking , 2002 .

[9] Ben Carterette,et al. Million Query Track 2007 Overview , 2008, TREC.

[10] Luiz André Barroso,et al. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[11] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[12] Jimmy J. Lin. Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce , 2009, SIGIR.

[13] Claudio Lucchese,et al. 7th workshop on large-scale distributed systems for information retrieval (LSDS-IR'09) , 2009, SIGF.

[14] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15] Tie-Yan Liu,et al. Learning to rank for information retrieval , 2009, SIGIR.

[16] W. Bruce Croft,et al. A Markov random field model for term dependencies , 2005, SIGIR '05.

[17] Donald Metzler,et al. Beyond bags of words: effectively modeling dependence and features in information retrieval , 2008, SIGF.

[18] JUSTIN ZOBEL,et al. Inverted files for text search engines , 2006, CSUR.

[19] Jimmy J. Lin,et al. Exploring Large-Data Issues in the Curriculum: A Case Study with MapReduce , 2008 .