HiPPIS: an online P2P system for efficient lookups on d-dimensional hierarchies

In this paper we describe HiPPIS, a system that enables efficient storage and on-line querying of multidimensional data organized into concept hierarchies and dispersed over a network. Our scheme utilizes an adaptive algorithm that automatically adjusts the level of indexing according to the granularity of the incoming queries, without assuming any prior knowledge of the workload. Efficient roll-up and drill-down operations take place in order to maximize the performance by minimizing query flooding. Extensive experimental evaluations show that, on top of the advantages that a distributed storage offers, our method answers the large majority of incoming queries, both point and aggregate ones, without flooding the network. At the same time, it manages to preserve the hierarchical nature of data. These characteristics are maintained even after sudden shifts in the workload.

[1]  Ian T. Foster,et al.  Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design , 2002, ArXiv.

[2]  Karl Aberer,et al.  The chatty web: emergent semantics through gossiping , 2003, WWW '03.

[3]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[4]  Beng Chin Ooi,et al.  PeerDB: a P2P-based system for distributed data sharing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[5]  Evaggelia Pitoura,et al.  Content-Based Routing of Path Queries in Peer-to-Peer Systems , 2004, EDBT.

[6]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[7]  Theoni Pitoura,et al.  Replication, Load Balancing and Efficient Range Query Processing in DHTs , 2006, EDBT.

[8]  Laks V. S. Lakshmanan,et al.  QC-trees: an efficient summary structure for semantic OLAP , 2003, SIGMOD '03.

[9]  Verena Kantere,et al.  GrouPeer: Dynamic clustering of P2P databases , 2009, Inf. Syst..

[10]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[11]  Ian T. Foster,et al.  Mapping the Gnutella Network , 2002, IEEE Internet Comput..

[12]  Karl Aberer,et al.  GridVine: Building Internet-Scale Semantic Overlay Networks , 2004, SEMWEB.

[13]  Hongjun Lu,et al.  Condensed cube: an effective approach to reducing data cube size , 2002, Proceedings 18th International Conference on Data Engineering.

[14]  G. Weikum Querying the Internet with PIER , 2005 .

[15]  Vijay Gopalakrishnan,et al.  Adaptive replication in peer-to-peer systems , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[16]  Yannis Sismanis,et al.  Hierarchical dwarfs for the rollup cube , 2003, DOLAP '03.

[17]  Hans-Peter Kriegel,et al.  The DC-tree: a fully dynamic index structure for data warehouses , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).