Brown Dwarf : Distributing the Power of OLAP to Unstructured P 2 P Overlays

In this paper we present the Brown Dwarf, a distributed system designed to efficiently store, query and update multidimensional data over a Peer-to-Peer overlay. The Brown Dwarf manages to distribute a highly effective centralized structure among peers on-the-fly. Both point and aggregate queries are then naturally answered online through cooperating nodes that hold parts of a fully or partially materialized data cube. Updates are also performed on-line, eliminating the usually costly over-night process. To tackle dynamic shifts in skew as well as network and node failures, our system employs an adaptive replication scheme, by creating copies of various units of the distributed data structure according to the load as well as the churn rate of the network. This process, called mirroring, ensures balanced load distribution, guarantees resilience and allows for smooth query resolution even in the most dynamic environments. Extensive experiments with the current implementation prove that our system achieves fair storage and load distribution with minimum overhead under variable data and query sets. It manages to quickly adapt even after sudden bursts in load and remains unaffected with up to 10% node failures. These measurements clearly identify Brown Dwarf as a robust and efficient system for distributing a data cube.

[1]  Verena Kantere,et al.  GrouPeer: Dynamic clustering of P2P databases , 2009, Inf. Syst..

[2]  G. Weikum Querying the Internet with PIER , 2005 .

[3]  François Goasdoué,et al.  WebContent: efficient P2P Warehousing of web data , 2008, Proc. VLDB Endow..

[4]  Umeshwar Dayal,et al.  A distributed OLAP infrastructure for e-commerce , 1999, Proceedings Fourth IFCIS International Conference on Cooperative Information Systems. CoopIS 99 (Cat. No.PR00384).

[5]  Karl Aberer,et al.  GridVine: Building Internet-Scale Semantic Overlay Networks , 2004, SEMWEB.

[6]  Dimitrios Tsoumakos,et al.  HiPPIS: an online P2P system for efficient lookups on d-dimensional hierarchies , 2008, WIDM '08.

[7]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[8]  Balachander Krishnamurthy,et al.  Flash crowds and denial of service attacks: characterization and implications for CDNs and web sites , 2002, WWW.

[9]  Suman Nath,et al.  Energy-Aware Server Provisioning and Load Dispatching for Connection-Intensive Internet Services , 2008, NSDI.

[10]  Laks V. S. Lakshmanan,et al.  Efficient OLAP Query Processing in Distributed Data Warehouses , 2002, EDBT.

[11]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[12]  Jacky C. Chu,et al.  Availability and locality measurements of peer-to-peer file systems , 2002, SPIE ITCom.

[13]  J. Weiner,et al.  Describing inequality in plant size or fecundity , 2000 .

[14]  Hongjun Lu,et al.  Condensed cube: an effective approach to reducing data cube size , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Beng Chin Ooi,et al.  PeerDB: a P2P-based system for distributed data sharing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[16]  Jens Dittrich,et al.  Dwarfs in the rearview mirror: how big are they really? , 2008, Proc. VLDB Endow..

[17]  Yannis Sismanis,et al.  Dwarf: shrinking the PetaCube , 2002, SIGMOD '02.

[18]  Hans-Peter Kriegel,et al.  The DC-tree: a fully dynamic index structure for data warehouses , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[19]  Laks V. S. Lakshmanan,et al.  QC-trees: an efficient summary structure for semantic OLAP , 2003, SIGMOD '03.

[20]  Beng Chin Ooi,et al.  An adaptive peer-to-peer network for distributed caching of OLAP results , 2002, SIGMOD '02.