Model-view sensor data management in the cloud

Infinite nature of sensor data poses a serious challenge for query processing even in a cloud infrastructure. Model-based sensor data approximation reduces the amount of data for query processing, but all modeled segments need to be scanned, in the worst case. In this paper, we propose an innovative index for modeled segments in key-value stores, namely KVI-index. KVI-index has an in-memory tree component and a secondary structure materialized in the key-value store that maps the tree nodes to the modeled data segments. Then, we introduce a KVI-index-Scan-MapReduce hybrid approach to perform efficient query processing. As proved by a series of experiments in a real private cloud infrastructure, our approach outperforms in query response time and index updating efficiency both Hadoop-based parallel processing of the raw sensor data and multiple alternative indexing approaches of model-view data.

[1]  Jorge-Arnulfo Quiané-Ruiz,et al.  Only Aggressive Elephants are Fast Elephants , 2012, Proc. VLDB Endow..

[2]  Willy Zwaenepoel,et al.  HadoopToSQL: a mapReduce query optimizer , 2010, EuroSys '10.

[3]  Vinay Setty,et al.  Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) , 2010, Proc. VLDB Endow..

[4]  Ramez Elmasri,et al.  The Time Index: An Access Structure for Temporal Data , 1990, VLDB.

[5]  Ambuj K. Singh,et al.  MIST: Distributed Indexing and Querying in Sensor Networks using Statistical Models , 2007, VLDB.

[6]  Chuan-Heng Ang,et al.  The Interval B-Tree , 1995, Inf. Process. Lett..

[7]  Karl Aberer,et al.  A Survey of Model-based Sensor Data Acquisition and Management , 2013, Managing and Mining Sensor Data.

[8]  Peter J. Haas,et al.  Eagle-eyed elephant: split-oriented indexing in Hadoop , 2013, EDBT '13.

[9]  Samuel Madden,et al.  MauveDB: supporting model-based user views in database systems , 2006, SIGMOD Conference.

[10]  Jeffrey Scott Vitter,et al.  Optimal dynamic interval management in external memory , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[11]  Lars George,et al.  HBase: The Definitive Guide , 2011 .

[12]  Hans-Peter Kriegel,et al.  Managing Intervals Efficiently in Object-Relational Databases , 2000, VLDB.

[13]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[14]  Michael Stonebraker,et al.  Segment indexes: dynamic indexing techniques for multi-dimensional interval data , 1991, SIGMOD '91.

[15]  Samuel Madden,et al.  Querying continuous functions in a database system , 2008, SIGMOD Conference.

[16]  Divyakant Agrawal,et al.  MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[17]  Karl Aberer,et al.  An adaptive approach for online segmentation of multi-dimensional mobile data , 2012, MobiDE '12.

[18]  Karl Aberer,et al.  Towards Online Multi-model Approximation of Time Series , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[19]  Peter Triantafillou,et al.  Interval indexing and querying on key-value cloud stores , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).