SETS: search enhanced by topic segmentation

We present SETS, an architecture for efficient search in peer-to-peer networks, building upon ideas drawn from machine learning and social network theory. The key idea is to arrange participating sites in a topic-segmented overlay topology in which most connections are short-distance, connecting pairs of sites with similar content. Topically focused sets of sites are then joined together into a single network by long-distance links. Queries are matched and routed to only the topically closest regions. We discuss a variety of design issues and tradeoffs that an implementor of SETS would face. We show that SETS is efficient in network traffic and query processing load.

[1]  Patricia Simpson Query processing in a heterogeneous retrieval network , 1988, SIGIR '88.

[2]  Luo Si,et al.  A language modeling framework for resource selection and results merging , 2002, CIKM '02.

[3]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[4]  Chi-Hang Chan,et al.  Advanced Peer Clustering and Firework Query Model in the Peer-to-Peer Network , 2003, WWW.

[5]  Peter B. Danzig,et al.  Harvest: A Scalable, Customizable Discovery and Access System , 1994 .

[6]  James C. French,et al.  Comparing the performance of database selection algorithms , 1999, SIGIR '99.

[7]  Peter Jackson,et al.  Database Selection Using Actual Physical and Acquired Logical Collection Resources in a Massive Domain-specific Operational Environment , 2002, VLDB.

[8]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[9]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[10]  Ka Boon Ng,et al.  Peer Clustering and Firework Query Model , 2002 .

[11]  Peter B. Danzig,et al.  The Harvest Information Discovery and Access System , 1995, Comput. Networks ISDN Syst..

[12]  Kathryn S. McKinley,et al.  Performance evaluation of a distributed architecture for information retrieval , 1996, SIGIR '96.

[13]  James C. French,et al.  Dissemination of collection wide information in a distributed information retrieval system , 1995, SIGIR '95.

[14]  Edith Cohen,et al.  Associative search in peer to peer networks: harnessing latent semantics , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[15]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[16]  Hans Weigand,et al.  Linguistic tool based information elicitation in large heterogeneous database networks , 1996 .

[17]  Chris Clifton,et al.  Information Brokers: Sharing Knowledge in a Heterogeneous Distributed System , 1993, DEXA.

[18]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[19]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[20]  Mark S. Granovetter T H E S T R E N G T H O F WEAK TIES: A NETWORK THEORY REVISITED , 1983 .

[21]  Hinrich Schütze,et al.  Projections for efficient document clustering , 1997, SIGIR '97.

[22]  Mark A. Sheldon,et al.  Content Routing for Distributed Information Servers , 1994, EDBT.

[23]  Hector Garcia-Molina,et al.  Improving search in peer-to-peer networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[24]  Mike P. Papazoglou,et al.  Landscaping the information space of large multi-database networks , 2001, Data Knowl. Eng..

[25]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[26]  Gurmeet Singh Manku,et al.  Routing networks for distributed hash tables , 2003, PODC '03.

[27]  Peter B. Danzig,et al.  Distributed indexing: a scalable mechanism for distributed information retrieval , 1991, SIGIR '91.

[28]  Mark A. Sheldon,et al.  A CONTENT ROUTING SYSTEM FOR DISTRIBUTED INFORMATION SYSTEMS , 1993 .

[29]  James P. Callan,et al.  Effective retrieval with distributed collections , 1998, SIGIR '98.

[30]  S. Feld Social Structural Determinants of Similarity among Associates , 1982 .

[31]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[32]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[33]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[34]  Chanathip Namprempre,et al.  HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering , 1996, HYPERTEXT '96.

[35]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[36]  Gurmeet Singh Manku,et al.  Symphony: Distributed Hashing in a Small World , 2003, USENIX Symposium on Internet Technologies and Systems.

[37]  Zhichen Xu,et al.  PeerSearch: Efficient Information Retrieval in Peer-to-Peer Networks , 2002 .

[38]  Joann J. Ordille,et al.  Distributed active catalogs and meta-data caching in descriptive name services , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[39]  Jianying Wang,et al.  A corpus analysis approach for automatic query expansion , 1997, CIKM '97.