Scalable Semantic Search with Hybrid Concept Index over Structure Peer-to-Peer Network

The primary challenge in developing a peer-to- peer(P2P) file sharing system is implementing an efficient keyword search mechanism. Current keyword search approaches for structured P2P networks are built on the distributed inverted index by keywords. However, when executing multiple-attribute queries, they suffer from the problem of unscalable bandwidth consumption. Moreover, these approaches only support literally word match, not taking into account the meaning of word. In this paper, we propose an efficient keyword search mechanism over structure P2P network. Peers use a shared ontology to describe the content of a document and the subject of a query. A distributed hybrid concept index is constructed, which efficiently supports the query routing and matching, and avoids the intersection of inverted list among peers, which is cause of unscallabe network bandwidth consumption. Based on the semantic similarity between the subjects of queries and the contents of documents, peers can get results matching their queries semantically, instead of literally word match. Simulation experiments show that keyword search with the approach proposed in this paper is much less on bandwidth costs and much higher on retrieval perform than that based on standard inverted index by keywords.

[1]  Karl Aberer,et al.  ALVIS peers: a scalable full-text peer-to-peer retrieval engine , 2006, P2PIR '06.

[2]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[3]  Zhichen Xu,et al.  PeerSearch: Efficient Information Retrieval in Peer-to-Peer Networks , 2002 .

[4]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[5]  Michael B. Jones,et al.  SkipNet: A Scalable Overlay Network with Practical Locality Properties , 2003, USENIX Symposium on Internet Technologies and Systems.

[6]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[7]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[8]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[9]  Sandhya Dwarkadas,et al.  Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval , 2004, NSDI.

[10]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[11]  Brian McBride,et al.  Jena: Implementing the RDF Model and Syntax Specification , 2001, SemWeb.

[12]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[13]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993 .

[14]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[15]  Omprakash D. Gnawali A Keyword-Set Search System for Peer-to-Peer Networks , 2002 .

[16]  Guangwen Yang,et al.  Making Peer-to-Peer Keyword Searching Feasible Using Multi-level Partitioning , 2004, IPTPS.

[17]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .