Index recommendation tool for optimized information discovery over distributed hash tables

Peer-to-peer (P2P) networks allow for efficient information discovery in large-scale distributed systems. Although point queries are well supported by current P2P systems — in particular systems based on distributed hash tables (DHTs) —, providing efficient support for more complex queries remains a challenge. Our research focuses on the efficient support for multiattribute range (MAR) queries over DHT-based information discovery systems. Traditionally, the support for MAR queries over DHTs has been provided either by creating an individual index for each data attribute or by creating a single index using the combination of all data attributes. In contrast to these approaches, we propose to create a set of indices over selected attribute combinations. In order to limit the overhead induced by index maintenance, the total number of created indices has to be limited. Thus, the resulting problem is to create a limited number of indices such that the overall system performance is optimal for MAR queries. In this paper, we propose an index recommendation tool that implements heuristic solutions to this NP-hard problem. Our evaluations show that these heuristics lead to a close-to-optimal system performance for MAR queries.

[1]  Surajit Chaudhuri,et al.  Index selection for databases: a hardness study and a principled heuristic solution , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  Sam Lightstone,et al.  DB2 Design Advisor: Integrated Automatic Physical Database Design , 2004, VLDB.

[3]  D. Hilbert Ueber die stetige Abbildung einer Line auf ein Flächenstück , 1891 .

[4]  Manish Parashar,et al.  Flexible information discovery in decentralized distributed systems , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[5]  Frank Dürr,et al.  OID: Optimized Information Discovery Using Space Filling Curves in P2P Overlay Networks , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[6]  Surajit Chaudhuri,et al.  Index merging , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[7]  Artur Andrzejak,et al.  Scalable, efficient range queries for grid information services , 2002, Proceedings. Second International Conference on Peer-to-Peer Computing,.

[8]  Min Cai,et al.  MAAN: A Multi-Attribute Addressable Network for Grid Information Services , 2003, Journal of Grid Computing.

[9]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[10]  Sriram Ramabhadran,et al.  A case study in building layered DHT applications , 2005, SIGCOMM '05.

[11]  Beng Chin Ooi,et al.  Supporting multi-dimensional range queries in peer-to-peer systems , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[12]  J. Noh,et al.  Pseudo-DHT: Distributed Search Algorithm for P2P Video Streaming , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[13]  D. Hilbert Über die stetige Abbildung einer Linie auf ein Flächenstück , 1935 .

[14]  Daniel C. Zilio,et al.  DB2 advisor: an optimizer smart enough to recommend its own indexes , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[15]  Janne Riihijärvi,et al.  A survey on resource discovery mechanisms, peer-to-peer and service discovery frameworks , 2008, Comput. Networks.

[16]  Frank Dürr,et al.  Scalable spatial information discovery over Distributed Hash Tables , 2009, COMSWARE '09.

[17]  Surajit Chaudhuri,et al.  Automatic physical database tuning: a relaxation-based approach , 2005, SIGMOD '05.

[18]  Theoni Pitoura,et al.  Towards a Unifying Framework for Complex Query Processing over Structured Peer-to-Peer Data Networks , 2003, DBISP2P.

[19]  Hector Garcia-Molina,et al.  One torus to rule them all: multi-dimensional queries in P2P systems , 2004, WebDB '04.