An Efficient Semantic-Based Search Schema in Unstructured P2P Network

How to efficiently organize and manage distributed resources is a critical issue of P2P systems. In this paper, we focus on both the P2P search mechanism and the topology structure of overlay network which are the key factors to dominate the performance of resource search and discovery. We mostly concern these issues as follows: Heterogeneity of nodes in P2P networks. Topology match between the P2P overlay network and the underlying physical network. In general, most peers that hold similar shared resources have similar requirements. We consider all the factors mentioned above, and propose a semantic-based and locality-aware hybrid P2P architecture (SLHP) including the method of model construction and maintenance and the multi-root mechanism to avoid the single point failure. SLHP considers the physical locality and the semantic similarity of the shared resource at the same time, in which nodes are clustered into domains according to the physical distance, and the nodes in a domain are clustered into groups according to the similar shared resource. The physical distance is measured by the RTT value of two peers. The calculation method of semantic similarity is also given. And then a semantic-based search schema is proposed. When the search occurs, the query is forwarded to super peers, and among super peers the query is forwarded to the neighbors who have the most similar shared resource with the query keyword. We also give a proof that according to the algorithm of construction and maintenance of SLHP, it is ensured that the existed resource must be found in theory. The semantic-based search can reduce the number of the transferred messages comparing with the standard flooding scheme, since it only produces k messages each hop, and forward to the most expected peers. Along with the routing indices being optimized continuously, which represents the knowledge of nodes about the entire overlay network, the hit ratio can be improved. Finally, in order to evaluate the performance of semantic-based search schema, three sets of simulations are set up to evaluate the key features of the network topology and semantic-based search algorithm. Experimental results show that the semantic-based search schema can reduce the search space, and get a good tradeoff between the performance and the overhead.