Workshop on large-scale distributed systems for information retrieval

The Workshop on Large-Scale Distributed Systems for Information Retrieval was a venue for seminal ideas on the design of systems for search. The workshop focused mainly on mechanisms for P2P IR, which is currently a highly popular research area, but it also had fruitful discussions and presentations on other architectures for large-scale systems. Given the attendance and the good level of discussion, we conclude that systems for information retrieval is a growing and promising area of research.

[1]  Berthier A. Ribeiro-Neto,et al.  Query performance for tightly coupled distributed digital libraries , 1998, DL '98.

[2]  Panos Kalnis,et al.  Real Datasets for File-Sharing Peer-to-Peer Systems , 2005, DASFAA.

[3]  Munindar P. Singh,et al.  Community-based service location , 2001, CACM.

[4]  Kunle Olukotun,et al.  Maximizing CMP throughput with mediocre cores , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[5]  Emin Gün Sirer,et al.  Client behavior and feed characteristics of RSS, a publish-subscribe system for web micronews , 2005, IMC '05.

[6]  Ellen M. Voorhees,et al.  The fourteenth text retrieval conference TREC 2005 , 2006 .

[7]  Karl Aberer,et al.  Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[9]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[10]  Fabio Crestani,et al.  A Topic-Based Measure of Resource Description Quality for Distributed Information Retrieval , 2009, ECIR.

[11]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[12]  Ido Dagan,et al.  Similarity-Based Methods for Word Sense Disambiguation , 1997, ACL.

[13]  Ricardo A. Baeza-Yates,et al.  Analyzing imbalance among homogeneous index servers in a web search system , 2007, Inf. Process. Manag..

[14]  Sriram Ramabhadran,et al.  A case study in building layered DHT applications , 2005, SIGCOMM '05.

[15]  Dimitrios Gunopulos,et al.  A local search mechanism for peer-to-peer networks , 2002, CIKM '02.

[16]  Howard Jay Siegel,et al.  Eliminating Memory for Fragmentation Within Partitionable SIMD/SPMD Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[17]  Ophir Frieder,et al.  Hourly analysis of a very large topically categorized web query log , 2004, SIGIR '04.

[18]  Beverly Yang,et al.  Retroactive answering of search queries , 2006, WWW '06.

[19]  Torsten Suel,et al.  ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval , 2003, WebDB.

[20]  Jimmy J. Lin Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce , 2009, SIGIR.

[21]  Fabrizio Silvestri,et al.  Sorting using BItonic netwoRk wIth CUDA , 2009, LSDS-IR@SIGIR.

[22]  William J. Dally,et al.  Efficient conditional operations for data-parallel architectures , 2000, MICRO 33.

[23]  Marie-Claire Jenkins,et al.  Conservative stemming for search and indexing , 2005 .

[24]  Elth Ogston,et al.  On the Value of Random Opinions in Decentralized Recommendation , 2006, DAIS.

[25]  Karl Aberer,et al.  P-Grid: A Self-Organizing Access Structure for P2P Information Systems , 2001, CoopIS.

[26]  Dongmei Jia,et al.  Search in Peer-to-Peer File-Sharing System: Like Metasearch Engines, But Not Really , 2005 .

[27]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[28]  Andrea Esuli,et al.  CoPhIR: a Test Collection for Content-Based Image Retrieval , 2009, ArXiv.

[29]  Manolis Koubarakis,et al.  Filtering algorithms for information retrieval models with named attributes and proximity operators , 2004, SIGIR '04.

[30]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[31]  Ophir Frieder,et al.  A Tool for Information Retrieval Research in Peer-to-Peer File Sharing Systems , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[32]  David Novak,et al.  Crawling, Indexing, and Similarity Searching Images on the Web , 2008, SEBD.

[33]  Jie Lu,et al.  User modeling for full-text federated search in peer-to-peer networks , 2006, SIGIR '06.

[34]  Pasquale Savino,et al.  Approximate similarity search in metric spaces using inverted files , 2008, Infoscale.

[35]  Nuno Lopes,et al.  Implementing Range Queries with a Decentralized Balanced Tree Over DHTs , 2006 .

[36]  Nagiza F. Samatova,et al.  Fast Matching for All Pairs Similarity Search , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[37]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[38]  Gilad Mishne,et al.  Leave a Reply: An Analysis of Weblog Comments , 2006 .

[39]  Manolis Koubarakis,et al.  Publish/subscribe functionality in IR environments using structured overlay networks , 2005, SIGIR '05.

[40]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[41]  Zheng Chen,et al.  Latent semantic analysis for multiple-type interrelated data objects , 2006, SIGIR.

[42]  Ulf Assarsson,et al.  Fast parallel GPU-sorting using a hybrid algorithm , 2008, J. Parallel Distributed Comput..

[43]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[44]  Robert Tappan Morris,et al.  OverCite: A Distributed, Cooperative CiteSeer , 2006, NSDI.

[45]  Iadh Ounis,et al.  Incorporating term dependency in the dfr framework , 2007, SIGIR.

[46]  Alistair Moffat,et al.  Fast on-line index construction by geometric partitioning , 2005, CIKM '05.

[47]  Harry W. Agius MPEG-7: Multimedia Content Description Interface , 2008, Encyclopedia of Multimedia.

[48]  Pascal Felber,et al.  Semantic Peer-to-Peer Overlays for Publish/Subscribe Networks , 2005, Euro-Par.

[49]  Bing Liu,et al.  Identifying comparative sentences in text documents , 2006, SIGIR.

[50]  Karl Aberer,et al.  Web text retrieval with a P2P query-driven index , 2007, SIGIR.

[51]  Jasmine Novak,et al.  Geographic routing in social networks , 2005, Proc. Natl. Acad. Sci. USA.

[52]  Hinrich Schütze,et al.  Method combination for document filtering , 1996, SIGIR '96.

[53]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[54]  Donna K. Harman,et al.  Overview of the Third Text REtrieval Conference (TREC-3) , 1995, TREC.

[55]  Adam Kilgarriff,et al.  Measures for Corpus Similarity and Homogeneity , 1998, EMNLP.

[56]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[57]  Shipeng Li,et al.  Distributed Segment Tree: Support of Range Query and Cover Query over DHT , 2006, IPTPS.

[58]  Iadh Ounis,et al.  Performance analysis of distributed information retrieval architectures using an improved network simulation model , 2007, Inf. Process. Manag..

[59]  Jimmy J. Lin Scalable Language Processing Algorithms for the Masses: A Case Study in Computing Word Co-occurrence Matrices with MapReduce , 2008, EMNLP.

[60]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[61]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[62]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[63]  Jimmy J. Lin,et al.  Pairwise Document Similarity in Large Collections with MapReduce , 2008, ACL.

[64]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[65]  Georgia Koutrika,et al.  Can social bookmarking improve web search? , 2008, WSDM '08.

[66]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[67]  Ladislav Hluchý,et al.  Towards Large Scale Semantic Annotation Built on MapReduce Architecture , 2008, ICCS.

[68]  Victor R. Lesser,et al.  A reinforcement learning based distributed search algorithm for hierarchical peer-to-peer information retrieval systems , 2007, AAMAS '07.

[69]  James P. Callan,et al.  Collection selection and results merging with topically organized U.S. patents and TREC data , 2000, CIKM '00.

[70]  Roberto J. Bayardo,et al.  Scaling up all pairs similarity search , 2007, WWW '07.

[71]  Andrei Z. Broder,et al.  Efficient query evaluation using a two-level retrieval process , 2003, CIKM '03.

[72]  Hasan Davulcu,et al.  Term Ranking for Clustering Web Search Results , 2007, WebDB.

[73]  Ben Y. Zhao,et al.  Towards a Common API for Structured Peer-to-Peer Overlays , 2003, IPTPS.

[74]  Jeng-Horng Chen,et al.  A moving PIV system for ship model test in a towing tank , 2006 .

[75]  Lada A. Adamic,et al.  Zipf's law and the Internet , 2002, Glottometrics.

[76]  Zhichen Xu,et al.  pFilter: global information filtering and dissemination using structured overlay networks , 2003, The Ninth IEEE Workshop on Future Trends of Distributed Computing Systems, 2003. FTDCS 2003. Proceedings..

[77]  Beibei Li,et al.  Enhancing clustering blog documents by utilizing author/reader comments , 2007, ACM-SE 45.

[78]  Hector Garcia-Molina,et al.  Query processing and inverted indices in shared-nothing text document information retrieval systems , 1993, The VLDB Journal.

[79]  Justin Zobel,et al.  Efficient single-pass index construction for text databases , 2003, J. Assoc. Inf. Sci. Technol..

[80]  Anthony Wirth,et al.  Engineering Burstsort: Towards Fast In-Place String Sorting , 2008, WEA.

[81]  Gerhard Weikum,et al.  Improving collection selection with overlap awareness in P2P search engines , 2005, SIGIR '05.

[82]  Alberto O. Mendelzon,et al.  Similarity-based queries , 1995, PODS '95.

[83]  Ivana Podnar Žarko The CIKM 2006 Workshop on Information Retrieval in Peer-to-Peer Networks , 2007 .

[84]  M E J Newman,et al.  Identity and Search in Social Networks , 2002, Science.

[85]  Marco Patella,et al.  The many facets of approximate similarity search , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[86]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[87]  Peter Bailey,et al.  Engineering a multi-purpose test collection for Web retrieval experiments , 2003, Inf. Process. Manag..

[88]  Alexandru Nicolau,et al.  Adaptive Bitonic Sorting: An Optimal Parallel Algorithm for Shared-Memory Machines , 1989, SIAM J. Comput..

[89]  ChengXiang Zhai,et al.  Mining long-term search history to improve search accuracy , 2006, KDD '06.

[90]  Forbes J. Burkowski Retrieval performance of a distributed text database utilizing a parallel processor document server , 1990, DPDS '90.

[91]  Gerhard Weikum,et al.  Data partitioning and load balancing in parallel disk systems , 1998, The VLDB Journal.

[92]  Kevin S. McCurley,et al.  Analysis of anchor text for web search , 2003, SIGIR.

[93]  Gerhard Weikum,et al.  MINERVAinfinity: A Scalable Efficient Peer-to-Peer Search Engine , 2005, Middleware.

[94]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[95]  D. Zeinalipour-Yazti,et al.  Information retrieval techniques for peer-to-peer networks , 2004, Computing in Science & Engineering.

[96]  Chi Zhang,et al.  Brushwood: Distributed Trees in Peer-to-Peer Systems , 2005, IPTPS.

[97]  Jonathan L. Herlocker,et al.  A collaborative filtering algorithm and evaluation metric that accurately model the user experience , 2004, SIGIR '04.

[98]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[99]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[100]  Gabriel Zachmann,et al.  GPU-ABiSort: optimal parallel sorting on stream architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[101]  Berthier A. Ribeiro-Neto,et al.  Efficient distributed algorithms to build inverted files , 1999, SIGIR '99.

[102]  Weimao Ke,et al.  Strong Ties vs. Weak Ties: Studying the Clustering Paradox for Decentralized Search , 2009, LSDS-IR@SIGIR.

[103]  Michael E. Saks,et al.  The periodic balanced sorting network , 1989, JACM.

[104]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[105]  Sang Joon Lee,et al.  Improvement of natural ventilation in a large factory building using a louver ventilator , 2008 .

[106]  Mary Baker,et al.  Peer-to-Peer Caching Schemes to Address Flash Crowds , 2002, IPTPS.

[107]  S. Robertson The probability ranking principle in IR , 1997 .

[108]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[109]  Hector Garcia-Molina,et al.  Index structures for information filtering under the vector space model , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[110]  Anne-Marie Kermarrec,et al.  Rappel: Exploiting interest and network locality to improve fairness in publish-subscribe systems , 2009, Comput. Networks.

[111]  Omprakash D. Gnawali A Keyword-Set Search System for Peer-to-Peer Networks , 2002 .

[112]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[113]  Paolo Ferragina,et al.  A personalized search engine based on Web‐snippet hierarchical clustering , 2005, WWW '05.

[114]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM '04.

[115]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[116]  Alan Hanjalic,et al.  Content-Based Analysis of Digital Video , 2004, Springer US.

[117]  Alistair Moffat,et al.  Structured Index Organizations for High-Throughput Text Querying , 2006, SPIRE.

[118]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[119]  Andrei Z. Broder,et al.  Sampling Search-Engine Results , 2005, WWW '05.

[120]  Justin Zobel,et al.  Cache-efficient string sorting using copying , 2007, ACM J. Exp. Algorithmics.

[121]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[122]  Nenghai Yu,et al.  Can phrase indexing help to process non-phrase queries? , 2008, CIKM '08.

[123]  Alistair Moffat,et al.  A pipelined architecture for distributed text query evaluation , 2007, Information Retrieval.

[124]  James P. Callan,et al.  Automatic discovery of language models for text databases , 1999, SIGMOD '99.

[125]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[126]  Christos Doulkeridis,et al.  Peer-to-peer similarity search over widely distributed document collections , 2008, LSDS-IR '08.

[127]  Karl Aberer,et al.  ALVIS peers: a scalable full-text peer-to-peer retrieval engine , 2006, P2PIR '06.

[128]  Michael J. Cafarella,et al.  Building Nutch: Open Source Search , 2004, ACM Queue.

[129]  Milad Shokouhi,et al.  Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval , 2007, ECIR.

[130]  Andrea Esuli,et al.  PP-Index: Using Permutation Prefixes for Efficient and Scalable Approximate Similarity Search , 2009, LSDS-IR@SIGIR.

[131]  Yi Zhang,et al.  Maximum likelihood estimation for filtering thresholds , 2001, SIGIR '01.

[132]  Kotagiri Ramamohanarao,et al.  Guidelines for presentation and comparison of indexing techniques , 1996, SGMD.

[133]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[134]  Philippas Tsigas,et al.  A Practical Quicksort Algorithm for Graphics Processors , 2008, ESA.

[135]  Gudrun Fischer,et al.  Towards scatter/gather browsing in a hierarchical peer-to-peer network , 2005, P2PIR '05.

[136]  Ziv Bar-Yossef,et al.  Random sampling from a search engine's index , 2006, WWW '06.

[137]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[138]  Miguel Castro,et al.  SplitStream: high-bandwidth multicast in cooperative environments , 2003, SOSP '03.

[139]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[140]  James P. Callan Learning while filtering documents , 1998, SIGIR '98.

[141]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[142]  James C. French,et al.  Obtaining language models of web collections using query-based sampling techniques , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[143]  Tao Wu,et al.  Efficient mobile content delivery by exploiting user interest correlation , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[144]  Anne-Marie Kermarrec,et al.  From Epidemics to Distributed Computing , 2004 .

[145]  Wolfgang Nejdl,et al.  Can all tags be used for search? , 2008, CIKM '08.

[146]  Hector Garcia-Molina,et al.  Semantic Overlay Networks for P2P Systems , 2004, AP2PC.

[147]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[148]  Anthony K. H. Tung,et al.  Spatial clustering methods in data mining : A survey , 2001 .

[149]  Michael Isard,et al.  A design for high-performance flash disks , 2007, OPSR.

[150]  David R. Karger,et al.  Simple Efficient Load-Balancing Algorithms for Peer-to-Peer Systems , 2004, SPAA '04.

[151]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[152]  Gonzalo Navarro,et al.  Effective Proximity Retrieval by Ordering Permutations , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[153]  Anne-Marie Kermarrec,et al.  Peer sharing behaviour in the eDonkey network, and implications for the design of server-less file sharing systems , 2006, EuroSys.

[154]  Partha Dasgupta,et al.  EFFICIENT DISCOVERY OF IMPLICITLY FORMED PEER-TO-PEER COMMUNITIES # , 2002 .

[155]  Ricardo A. Baeza-Yates,et al.  Challenges on Distributed Web Retrieval , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[156]  Architectural Alternatives for Information Filtering in Structured Overlay Networks , 2007 .

[157]  Ophir Frieder,et al.  Finding rare data objects in P2P file-sharing systems , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[158]  Emin Gün Sirer,et al.  Beehive: O(1) Lookup Performance for Power-Law Query Distributions in Peer-to-Peer Overlays , 2004, NSDI.

[159]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[160]  Jaime G. Carbonell,et al.  Document Representation and Query Expansion Models for Blog Recommendation , 2008, ICWSM.

[161]  Gerhard Weikum,et al.  MAPS: approximate publish/subscribe functionality in peer-to-peer networks , 2006, ADPUC '06.

[162]  Charles L. A. Clarke,et al.  Overview of the TREC 2004 Terabyte Track , 2004, TREC.

[163]  Eric A. Brewer,et al.  Lessons from Giant-Scale Services , 2001, IEEE Internet Comput..

[164]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[165]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[166]  Pavel Zezula,et al.  Combining metric features in large collections , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[167]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[168]  John F. Canny,et al.  Collaborative filtering with privacy via factor analysis , 2002, SIGIR '02.

[169]  Donald F. Towsley,et al.  Computing Performance Bounds of Fork-Join Parallel Programs Under a Multiprocessing Environment , 1998, IEEE Trans. Parallel Distributed Syst..

[170]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[171]  Rüdiger Westermann,et al.  UberFlow: a GPU-based particle engine , 2004, SIGGRAPH '04.

[172]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[173]  Hugh E. Williams,et al.  Burst tries: a fast, efficient data structure for string keys , 2002, TOIS.

[174]  Santosh S. Vempala,et al.  A divide-and-merge methodology for clustering , 2005, PODS '05.

[175]  Fabrizio Silvestri,et al.  Query-driven document partitioning and collection selection , 2006, InfoScale '06.

[176]  Pat Hanrahan,et al.  Photon mapping on programmable graphics hardware , 2003, HWWS '03.

[177]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[178]  Munindar P. Singh,et al.  Searching social networks , 2003, AAMAS '03.

[179]  Mark S. Ackerman,et al.  Searching for expertise in social networks: a simulation of potential strategies , 2005, GROUP.

[180]  Weimao Ke,et al.  Collaborative classifier agents: studying the impact of learning in distributed document classification , 2007, JCDL '07.

[181]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[182]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[183]  Euripides G. M. Petrakis,et al.  A measure for cluster cohesion in semantic overlay networks , 2008, LSDS-IR '08.

[184]  Sebastian Michel,et al.  P2P Web Search: Make It Light, Make It Fly (Demo) , 2007, CIDR.

[185]  Gerhard Weikum,et al.  IO-Top-k at TREC 2006: Terabyte Track , 2006, TREC.

[186]  Torsten Suel,et al.  Three-Level Caching for Efficient Query Processing in Large Web Search Engines , 2005, WWW '05.

[187]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[188]  Yong Yu,et al.  Optimizing web search using social annotations , 2007, WWW '07.

[189]  Thomas Hofmann,et al.  Latent Class Models for Collaborative Filtering , 1999, IJCAI.

[190]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[191]  Grant Schoenebeck,et al.  CHORA: Expert-Based P2P Web Search , 2006, AP2PC.

[192]  Matthieu Latapy,et al.  Combining the Use of Clustering and Scale-Free Nature of User Exchanges into a Simple and Efficient P2P System , 2005, Euro-Par.

[193]  Fabio Bellifemine,et al.  Developing Multi-Agent Systems with JADE (Wiley Series in Agent Technology) , 2007 .

[194]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[195]  Suman Nath,et al.  FlashDB: Dynamic Self-tuning Database for NAND Flash , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[196]  Ralph E. Johnson,et al.  Frameworks = (components + patterns) , 1997, CACM.

[197]  Charles L. A. Clarke,et al.  A document-centric approach to static index pruning in text retrieval systems , 2006, CIKM '06.

[198]  John Riedl,et al.  An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms , 2002, Information Retrieval.

[199]  Ophir Frieder,et al.  A view of the data on P2P file-sharing systems , 2009, J. Assoc. Inf. Sci. Technol..

[200]  Jun Wang,et al.  Distributed collaborative filtering for peer-to-peer file sharing systems , 2006, SAC.

[201]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[202]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[203]  Fabio Crestani,et al.  Adaptive Query-Based Sampling of Distributed Collections , 2006, SPIRE.

[204]  Hector Garcia-Molina,et al.  Performance of inverted indices in shared-nothing distributed text document information retrieval systems , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[205]  M. de Rijke,et al.  Building simulated queries for known-item topics: an analysis using six european languages , 2007, SIGIR.

[206]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[207]  Xin Li,et al.  Tag-based social interest discovery , 2008, WWW.

[208]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[209]  Rob Pike,et al.  Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..

[210]  Gurmeet Singh Manku,et al.  SETS: search enhanced by topic segmentation , 2003, SIGIR.

[211]  Ian T. Foster,et al.  Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design , 2002, ArXiv.

[212]  Anjali Gupta,et al.  Efficient Routing for Peer-to-Peer Overlays , 2004, NSDI.

[213]  Yao Zhang,et al.  Scan primitives for GPU computing , 2007, GH '07.

[214]  King-Lup Liu,et al.  Building efficient and effective metasearch engines , 2002, CSUR.

[215]  Marián Boguñá,et al.  Navigability of Complex Networks , 2007, ArXiv.

[216]  Alistair Moffat,et al.  Self-indexing inverted files for fast text retrieval , 1996, TOIS.

[217]  Andrea Esuli MiPai: Using the PP-Index to Build an Efficient and Scalable Similarity Search System , 2009, 2009 Second International Workshop on Similarity Search and Applications.

[218]  Jun Wang,et al.  TRIBLER: a social‐based peer‐to‐peer system , 2008, IPTPS.

[219]  Seung-won Hwang,et al.  Efficient Text Proximity Search , 2007, SPIRE.

[220]  Gerhard Weikum,et al.  MinervaDL: An Architecture for Information Retrieval and Filtering in Distributed Digital Libraries , 2007, ECDL.

[221]  Manolis Koubarakis,et al.  LibraRing: An Architecture for Distributed Digital Libraries Based on DHTs , 2005, ECDL.

[222]  Scott Shenker,et al.  Enhancing P2P File-Sharing with an Internet-Scale Query Processor , 2004, VLDB.

[223]  Abdur Chowdhury,et al.  Operational requirements for scalable search systems , 2003, CIKM '03.

[224]  Craig MacDonald,et al.  On single-pass indexing with MapReduce , 2009, SIGIR.

[225]  Alistair Moffat,et al.  Load balancing for term-distributed parallel retrieval , 2006, SIGIR.

[226]  R. Akavipat,et al.  Emerging semantic communities in peer web search , 2006, P2PIR '06.

[227]  James P. Callan,et al.  Experiments Using the Lemur Toolkit , 2001, TREC.

[228]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[229]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[230]  Peter Bailey,et al.  Overview of the TREC-8 Web Track , 2000, TREC.

[231]  Alistair Moffat,et al.  The design of a high performance information filtering system , 1996, SIGIR '96.

[232]  Richard M. Karp,et al.  Load Balancing in Structured P2P Systems , 2003, IPTPS.

[233]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[234]  Ophir Frieder,et al.  Masked Queries for Search Accuracy in Peer-to-Peer File-Sharing Systems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[235]  H. S. Heaps,et al.  Information retrieval, computational and theoretical aspects , 1978 .

[236]  Gerhard Weikum,et al.  P2P Content Search: Give the Web Back to the People , 2006, IPTPS.

[237]  Ophir Frieder,et al.  The Design of PIRS, a Peer-to-Peer Information Retrieval System , 2004, DBISP2P.

[238]  Vincenza Carchiolo,et al.  Social Behaviours Applied to P2P Systems: An efficient Algorithm for Resource Organisation , 2007, ArXiv.

[239]  Yoichi Shinoda,et al.  Information filtering based on user behavior analysis and best match text retrieval , 1994, SIGIR '94.

[240]  David Hawking,et al.  Evaluating sampling methods for uncooperative collections , 2007, SIGIR.

[241]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[242]  Mark Jabbal,et al.  Towards the Design of Synthetic-jet Actuators for Full-scale Flight Conditions , 2007 .

[243]  David J. DeWitt,et al.  Data placement in shared-nothing parallel database systems , 1997, The VLDB Journal.

[244]  Donna K. Harman,et al.  Overview of the first TREC conference , 1993, SIGIR.

[245]  Georgios Paltoglou,et al.  Hybrid results merging , 2007, CIKM '07.

[246]  Hongyuan Zha,et al.  Exploring social annotations for information retrieval , 2008, WWW.

[247]  Pavel Zezula,et al.  Similarity Search: The Metric Space Approach (Advances in Database Systems) , 2005 .

[248]  Aristides Gionis,et al.  On the feasibility of multi-site web search engines , 2009, CIKM.

[249]  Gerhard Weikum,et al.  Improving Collection Selection with Overlap-Awareness , 2005 .

[250]  Anne-Marie Kermarrec,et al.  Gossip-based peer sampling , 2007, TOCS.

[251]  Márk Jelasity,et al.  T-Man: Gossip-Based Overlay Topology Management , 2005, Engineering Self-Organising Systems.

[252]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[253]  Sandhya Dwarkadas,et al.  Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval , 2004, NSDI.

[254]  K. Sparck Jones,et al.  A TEST FOR THE SEPARATION OF RELEVANT AND NON‐RELEVANT DOCUMENTS IN EXPERIMENTAL RETRIEVAL COLLECTIONS , 1973 .

[255]  David D. Jensen,et al.  Navigating networks by using homophily and degree , 2008, Proceedings of the National Academy of Sciences.

[256]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[257]  H E Stanley,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[258]  Gerhard Weikum,et al.  Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices , 2006, CIKM '06.

[259]  James P. Callan,et al.  Document filtering with inference networks , 1996, SIGIR '96.

[260]  Jon Crowcroft,et al.  A survey and comparison of peer-to-peer overlay network schemes , 2005, IEEE Communications Surveys & Tutorials.

[261]  Dinesh Manocha,et al.  Fast and approximate stream mining of quantiles and frequencies using graphics processors , 2005, SIGMOD '05.

[262]  T. Chiueh,et al.  A Survey on Virtualization Technologies , 2005 .

[263]  GhemawatSanjay,et al.  The Google file system , 2003 .

[264]  Shuming Shi,et al.  Effective top-k computation in retrieving structured documents with term-proximity support , 2007, CIKM '07.

[265]  Marcus Fontoura,et al.  Using annotations in enterprise search , 2006, WWW '06.

[266]  Weimao Ke,et al.  Dynamicity vs. effectiveness: studying online clustering for scatter/gather , 2009, SIGIR.

[267]  Charles L. A. Clarke,et al.  Term proximity scoring for ad-hoc retrieval on very large text collections , 2006, SIGIR.