P2P OLAP: Data model, implementation and case study

It is a common situation nowadays that business groups own different companies that operate in an autonomous way. Nevertheless, these companies must be requested to provide the headquarters with summarized information for decision-making. An architecture for cooperative interchange of decision-making information seems to be a natural solution for this problem. We propose the use of a peer-to-peer (P2P) architecture for addressing the problem of processing OLAP data in a distributed environment, in a way that all companies involved can maintain full autonomy over the use of its own data resources. In a scenario like this, data exchange between peers occurs when one of them, in the role of a local peer, receives a query and, for answering it, requests data available in other nodes, denoted acquaintances. No global schema is assumed to exist for any data under this computing paradigm. Henceforth, data provided by an acquaintance of a local peer must be adapted, in a manner that answers to queries posed by local peer users conform the view those users have of their data. Because multidimensional data normally consist of a collection of views of aggregated data, a careful translation process is needed in this case, in order to transform any summary concept that appears in a peer acquaintance into a summary concept meaningful to the requesting peer. We first present a model for multidimensional data distributed in a P2P network, and a query rewriting technique, that allows a local peer to propagate OLAP queries among its acquaintances, obtaining a meaningful and correct answer. Mappings are performed using a novel technique called revise and map, based on belief revision concepts. Revising a dimension instance allows to produce consistent aggregations when an OLAP query is answered at more than one node. We then describe an implementation of a P2P system for answering OLAP queries over a network of data warehouses. We apply our proposal to a real-world case study of an insurance group. Finally, we report the results of an experimental evaluation of our implementation, and discuss the issues that must be accounted for in this setting.

[1]  Gabriel M. Kuper,et al.  A Robust Logical and Computational Characterisation of Peer-to-Peer Database Systems , 2003, DBISP2P.

[2]  Alon Y. Halevy,et al.  Efficient query reformulation in peer data management systems , 2004, SIGMOD '04.

[3]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[4]  Leopoldo E. Bertossi,et al.  Query Answering in Peer-to-Peer Data Exchange Systems , 2004, EDBT Workshops.

[5]  Alexandra Poulovassilis,et al.  Defining Peer-to-Peer Data Integration Using Both as View Rules , 2003, DBISP2P.

[6]  Dan Suciu,et al.  Schema mediation in peer data management systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  Diego Calvanese,et al.  Logical foundations of peer-to-peer data integration , 2004, PODS '04.

[8]  Alberto O. Mendelzon,et al.  Low Complexity Aggregation in GraphLog and Datalog , 1993, Theor. Comput. Sci..

[9]  Diego Calvanese,et al.  Inconsistency Tolerance in P2P Data Integration: An Epistemic Logic Approach , 2005, DBPL.

[10]  Mauricio Minuto Espil,et al.  Aggregate queries in peer-to-peer OLAP , 2004, DOLAP '04.

[11]  Dan Suciu,et al.  What Can Database Do for Peer-to-Peer? , 2001, WebDB.

[12]  Alberto O. Mendelzon,et al.  Maintaining data cubes under dimension updates , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[13]  Dario Colazzo,et al.  Mapping Maintenance in XML P2P Databases , 2005, DBPL.

[14]  Renée J. Miller,et al.  Mapping data in peer-to-peer systems: semantics and algorithmic issues , 2003, SIGMOD '03.

[15]  Mauricio Minuto Espil,et al.  Revising aggregation hierarchies in OLAP: a rule-based approach , 2003, Data Knowl. Eng..

[16]  Maurizio Lenzerini Principles of P2P Data Integration , 2004, DIWeb.

[17]  Renée J. Miller,et al.  Managing data mappings in the Hyperion project , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[18]  Luca Cabibbo,et al.  Dimension Compatibility for Data Mart Integration , 2004, SEBD.

[19]  Dimitrios Gunopulos,et al.  Efficient Approximate Query Processing in Peer-to-Peer Networks , 2007, IEEE Transactions on Knowledge and Data Engineering.

[20]  Alberto O. Mendelzon,et al.  Updating OLAP dimensions , 1999, DOLAP '99.

[21]  Sergio Greco,et al.  Proceedings of the Eleventh Italian Symposium on Advanced Database Systems, SEBD 2003, Cetraro (CS), Italy, June 24-27, 2003 , 2003, Sistemi Evoluti per Basi di Dati.

[22]  Dimitrios Gunopulos,et al.  Approximating Aggregation Queries in Peer-to-Peer Networks , 2006, 22nd International Conference on Data Engineering (ICDE'06).