Traditional search system approaches have either been centralized or use flooding to ensure accuracy of the results returned which have bad performance. This paper presents SPSS (Semantic Peer-to-Peer Search System) that search documents through the P2P network hierarchically based on document semantic vector generated by Latent Semantic Indexing (LSI) [1]. The search cost for a given query is thereby reduced, since the indices of semantically related documents are likely to be co-located in the network. SPSS organize contents around their semantics in a P2P network. This makes it achieve accuracy comparable to centralized search systems. CAN [2] and range addressable network are used to organize the computing nodes. Owning to the hierarchical overlay network, the average number of logical hops per query is smaller than other flat architectures. Both theoretical analysis and experimental results show that SPSS has higher accuracy and less logic hops.
[1]
Elizabeth R. Jessup,et al.
Matrices, Vector Spaces, and Information Retrieval
,
1999,
SIAM Rev..
[2]
Richard A. Harshman,et al.
Indexing by Latent Semantic Analysis
,
1990,
J. Am. Soc. Inf. Sci..
[3]
J. Ritter.
Why Gnutella Can't Scale. No, Really
,
2001
.
[4]
Antony I. T. Rowstron,et al.
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
,
2001,
Middleware.
[5]
Mark Handley,et al.
A scalable content-addressable network
,
2001,
SIGCOMM '01.