Distributing and searching concept hierarchies: an adaptive DHT-based system

Concept hierarchies greatly help in the organization and reuse of information and are widely used in a variety of information systems applications. In this paper, we describe a method for efficiently storing and querying data organized into concept hierarchies and dispersed over a DHT. In our method, peers individually decide on the level of indexing according to the granularity of the incoming queries. Roll-up and drill-down operations are performed on a per-node basis in order to minimize the required bandwidth for answering queries on variable aggregation levels. We motivate our approach by applying it on a large-scale Grid system: Specifically, we apply our fully decentralized scheme that creates, queries and updates large volumes of hierarchical data on-line and replace the traditional centralized and strictly indexed information systems. Our extensive experimental results support this argument on many diverse configurations: Our system proves very efficient in skewed workloads, both over single and multiple hierarchy levels at the same time. It adapts to sudden changes in popularity and effectively stores and updates large amounts of data at very low cost.

[1]  Yannis Sismanis,et al.  Hierarchical dwarfs for the rollup cube , 2003, DOLAP '03.

[2]  Scott Shenker,et al.  Querying the Internet with PIER , 2003, VLDB.

[3]  Verena Kantere,et al.  GrouPeer: Dynamic clustering of P2P databases , 2009, Inf. Syst..

[4]  Hongjun Lu,et al.  Condensed cube: an effective approach to reducing data cube size , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  J. Leake,et al.  APEL: An implementation of Grid accounting using R-GMA , 2005 .

[6]  Hans-Peter Kriegel,et al.  The DC-tree: a fully dynamic index structure for data warehouses , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[7]  Karl Aberer,et al.  The chatty web: emergent semantics through gossiping , 2003, WWW '03.

[8]  Jennifer M. Schopf,et al.  Scalability analysis of three monitoring and information systems: MDS2, R-GMA, and Hawkeye , 2007, J. Parallel Distributed Comput..

[9]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[10]  Ian Foster,et al.  The Globus toolkit , 1998 .

[11]  Beng Chin Ooi,et al.  PeerDB: a P2P-based system for distributed data sharing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[12]  Evaggelia Pitoura,et al.  Content-Based Routing of Path Queries in Peer-to-Peer Systems , 2004, EDBT.

[13]  Karl Aberer,et al.  GridVine: Building Internet-Scale Semantic Overlay Networks , 2004, SEMWEB.

[14]  Werner Nutt,et al.  Relational Grid Monitoring Architecture (R-GMA) , 2003, ArXiv.

[15]  Laks V. S. Lakshmanan,et al.  QC-trees: an efficient summary structure for semantic OLAP , 2003, SIGMOD '03.

[16]  Alon Y. Halevy,et al.  Efficient query reformulation in peer data management systems , 2004, SIGMOD '04.