AbIx: An Approach to Content-Based Approximate Query Processing in Peer-to-Peer Data Systems

In recent years there has been a significant interest in peer-to-peer (P2P) environments in the community of data management. However, almost all work, so far, is focused on exact query processing in current P2P data systems. The autonomy of peers also is not considered enough. In addition, the system cost is very high because the information publishing method of shared data is based on each document instead of document set. In this paper, abstract indices (AbIx) are presented to implement content-based approximate queries in centralized, distributed and structured P2P data systems. It can be used to search as few peers as possible but get as many returns satisfying users’ queries as possible on the guarantee of high autonomy of peers. Also, abstract indices have low system cost, can improve the query processing speed, and support very frequent updates and the set information publishing method. In order to verify the effectiveness of abstract indices, a simulator of 10,000 peers, over 3 million documents is made, and several metrics are proposed. The experimental results show that abstract indices work well in various P2P data systems.

[1]  George Tzanetakis,et al.  Content-based retrieval of music in scalable peer-to-peer networks , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[2]  Hector Garcia-Molina,et al.  Comparing Hybrid Peer-to-Peer Systems , 2001, VLDB.

[3]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[4]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[5]  Jianzhong Li,et al.  A Kind of Content-Based Music Information Retrieval Method in Peer-to-peer Environment , 2002, ISMIR.

[6]  Alon Y. Halevy,et al.  Efficient query reformulation in peer data management systems , 2004, SIGMOD '04.

[7]  Hector Garcia-Molina,et al.  Improving search in peer-to-peer networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[8]  Thu D. Nguyen,et al.  Text-Based Content Search and Retrieval in Ad-hoc P2P Communities , 2002, NETWORKING Workshops.

[9]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[10]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[11]  Zhichen Xu,et al.  pSearch: information retrieval in structured overlays , 2003, CCRV.

[12]  Dan Suciu,et al.  What Can Database Do for Peer-to-Peer? , 2001, WebDB.

[13]  Hector Garcia-Molina,et al.  Efficient search in peer to peer networks , 2004 .

[14]  Beng Chin Ooi,et al.  PeerDB: a P2P-based system for distributed data sharing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[15]  David R. Karger,et al.  Looking up data in P2P systems , 2003, CACM.

[16]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[17]  Dan Suciu,et al.  Schema mediation in peer data management systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[18]  George Tzanetakis,et al.  A Scalable Peer-to-Peer System for Music Information Retrieval , 2004, Computer Music Journal.

[19]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.