Exploiting compression and approximation paradigms for effective and efficient online analytical processing over sensor network readings in data grid environments

Aggregate queries are useful tools in the context of sensor network‐based systems as they retrieve knowledge from huge amounts of summarized readings to be exploited for knowledge discovery purposes. Actually, data representation and query models are problematic issues for managing sensor network data, because streams produced by sensors are theoretically unbounded. In this paper, we present a Grid framework, called SensorGrid, on the basis of data compression and approximation paradigms, which allows us to provide approximate answers to aggregate queries on summarized sensor network data. These queries are the basis for achieving Online Analytical Processing (OLAP) over sensor network readings in Data Grid environments, with both effectiveness and efficiency. We also present our experience in the context of a real‐life system focused on the management of environmental sensor network data. Another contribution of our research is represented by the extensive experimental evaluation and analysis of SensorGrid, which, in more details, focuses on two main classes of aggregate range queries over sensor readings, namely, (i) the window queries, which apply an SQL aggregation operator over a fixed window over the reading stream produced by the sensor network, and (ii) the continuous queries, which instead consider a ‘moving’ window and produce as output a stream of answers. Both classes of queries are extremely useful to extract summarized knowledge to be exploited by OLAP‐like analysis tools over sensor network data. The experimental results, conducted on both synthetic and real‐life data sets, clearly confirm the benefits deriving from embedding data compression and approximation paradigms into Grid‐based sensor network data‐intensive management systems.Copyright © 2013 John Wiley & Sons, Ltd.

[1]  Ian T. Foster,et al.  Grid Services for Distributed System Integration , 2002, Computer.

[2]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[3]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[4]  Nimrod Megiddo,et al.  Range queries in OLAP data cubes , 1997, SIGMOD '97.

[5]  Anne Tchounikine,et al.  A model for distributing and querying a data warehouse on a computing grid , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[6]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[7]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[8]  Johannes Gehrke,et al.  Query Processing in Sensor Networks , 2003, CIDR.

[9]  Divesh Srivastava,et al.  On computing correlated aggregates over continual data streams , 2001, SIGMOD '01.

[10]  Andrew Rau-Chaplin,et al.  Implementing OLAP Query Fragment Aggregation and Recombination for the OLAP Enabled Grid , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[11]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[12]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[13]  Viswanath Poosala,et al.  Fast approximate answers to aggregate queries on a data cube , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[14]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[15]  Yixin Chen,et al.  Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams , 2005, Distributed and Parallel Databases.

[16]  Rajeev Motwani,et al.  Overcoming limitations of sampling for aggregation queries , 2001, Proceedings 17th International Conference on Data Engineering.

[17]  Alfredo Cuzzocrea,et al.  On Managing Very Large Sensor-Network Data Using Bigtable , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[18]  Alfredo Cuzzocrea,et al.  A Bigtable/MapReduce-Based Cloud Infrastructure for Effectively and Efficiently Managing Large-Scale Sensor Networks , 2012, Globe.

[19]  Aniruddha R. Thakar,et al.  When Database Systems Meet the Grid , 2005, CIDR.

[20]  Ivan Janciak,et al.  GridMiner: a fundamental infrastructure for building intelligent grid systems , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[21]  Elio Masciari,et al.  Approximate Query Answering on Sensor Network Data Streams , 2003, SEBD.

[22]  Alfredo Cuzzocrea Overcoming limitations of approximate query answering in OLAP , 2005, 9th International Database Engineering & Application Symposium (IDEAS'05).

[23]  Raghunath Othayoth Nambiar,et al.  Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Systems , 2005, VLDB.

[24]  Sergio Greco,et al.  A distributed system for answering range queries on sensor network data , 2005, Third IEEE International Conference on Pervasive Computing and Communications Workshops.

[25]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[26]  Harvey B. Newman,et al.  Distributed Heterogeneous Relational Data Warehouse In A Grid Environment , 2003, ArXiv.

[27]  Werner Dubitzky,et al.  Grid warehousing of molecular dynamics protein unfolding data , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[28]  Norman W. Paton,et al.  The design and implementation of Grid database services in OGSA‐DAI , 2005, Concurr. Pract. Exp..

[29]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[30]  Philippe Bonnet,et al.  Towards Sensor Database Systems , 2001, Mobile Data Management.

[31]  Jim Smith,et al.  Distributed Query Processing on the Grid , 2003, Int. J. High Perform. Comput. Appl..

[32]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[33]  Alfredo Cuzzocrea,et al.  A Grid Framework for Approximate Aggregate Query Answering on Summarized Sensor Network Readings , 2004, OTM Workshops.

[34]  Peter J. Haas,et al.  The New Jersey Data Reduction Report , 1997 .

[35]  Raghunath Nambiar,et al.  Large scale data warehouses on grid: Oracle database 10 g and HP proliant servers , 2005, VLDB 2005.

[36]  Peter Brezany,et al.  Toward a Grid-Based Zero-Latency Data Warehousing Implementation for Continuous Data Streams Processing , 2005, Int. J. Data Warehous. Min..

[37]  Ian T. Foster The globus toolkit for grid computing , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[38]  Peter Brezany,et al.  On-line analytical processing on large databases managed by computational grids , 2004, Proceedings. 15th International Workshop on Database and Expert Systems Applications, 2004..

[39]  Yannis E. Ioannidis,et al.  Histogram-Based Approximation of Set-Valued Query-Answers , 1999, VLDB.

[40]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[41]  Francesco Buccafurri,et al.  A quad-tree based multiresolution approach for two-dimensional summary data , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[42]  Rogério Luís de Carvalho Costa,et al.  An SLA-Enabled Grid DataWarehouse , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).

[43]  Andrew Rau-Chaplin,et al.  The OLAP-Enabled Grid: Model and Query Processing Algorithms , 2006, 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06).

[44]  Wolfgang Lehner,et al.  Querying Asynchronously Updated Sensor Data Sets under Quantified Constraints , 2004 .

[45]  Mong-Li Lee,et al.  ICICLES: Self-Tuning Samples for Approximate Query Answering , 2000, VLDB.

[46]  Anne Tchounikine,et al.  A Grid Services-Oriented Architecture for Efficient Operation of Distributed Data Warehouses on Globus , 2007, 21st International Conference on Advanced Information Networking and Applications (AINA '07).

[47]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..