ClusterSheddy : Load Shedding Using Moving Clusters over Spatio-temporal Data Streams

Moving object environments are characterized by large numbers of objects continuously sending location updates. At times, data arrival rates may spike up, causing the load on the system to exceed its capacity. This may result in increased output latencies, potentially leading to invalid or obsolete answers. Dropping data randomly, the most frequently used approach in the literature for load shedding, may adversely affect the accuracy of the results. We thus propose a load shedding technique customized for spatio-temporal stream data. In our model, spatio-temporal properties, such as location, time, direction and speed over time, serve as critical factors in the load shedding decision. The main idea is to abstract similarly moving objects into moving clusters which serve as summaries of their members' movement. Based on resource restrictions, members within clusters may be selectively discarded, while their locations are being approximated by their respective moving clusters. Our experimental study illustrates the performance gains achieved by our load-shedding framework and the tradeoff between the amount of data shed and the result accuracy.

[1]  Thomas Brinkhoff,et al.  A Framework for Generating Network-Based Moving Objects , 2002, GeoInformatica.

[2]  Panos Kalnis,et al.  On Discovering Moving Clusters in Spatio-temporal Data , 2005, SSTD.

[3]  Keith W. Ross,et al.  Computer networking - a top-down approach featuring the internet , 2000 .

[4]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[5]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[6]  Gunter Bolch,et al.  Queueing Networks and Markov Chains - Modeling and Performance Evaluation with Computer Science Applications, Second Edition , 1998 .

[7]  Luping Ding,et al.  CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity , 2004, VLDB.

[8]  A. Prasad Sistla,et al.  Modeling and querying moving objects , 1997, Proceedings 13th International Conference on Data Engineering.

[9]  Gunter Bolch,et al.  Queueing Networks and Markov Chains , 2005 .

[10]  Song Liu,et al.  Load shedding in stream databases: a control-based approach , 2006, VLDB.

[11]  Stanley B. Zdonik,et al.  Window-aware load shedding for aggregation queries over data streams , 2006, VLDB.

[12]  Peter J. Haas,et al.  The New Jersey Data Reduction Report , 1997 .

[13]  Chengyang Zhang,et al.  Advances in Spatial and Temporal Databases , 2015, Lecture Notes in Computer Science.

[14]  Matthias Jarke,et al.  Advances in Database Technology — EDBT 2002 , 2002, Lecture Notes in Computer Science.

[15]  Abhinandan Das,et al.  Approximate join processing over data streams , 2003, SIGMOD '03.

[16]  David G. Stork,et al.  Pattern Classification , 1973 .

[17]  Abhinandan Das,et al.  Semantic approximation of data stream joins , 2005, IEEE Transactions on Knowledge and Data Engineering.

[18]  Torsten Grust,et al.  Advances in database technology - EDBT 2006 : 10th International Conference on Extending Database Technology, Munich, Germany, March 2006; proceedings , 2006 .

[19]  Naphtali Rishe,et al.  Management of Dynamic Location Information in DOMINO , 2002, EDBT.

[20]  Sid C. H. Chu,et al.  When and Why Do People Walk in the City: The Influence of Urban Elements on Time-pattern of Pedestrian Movement , 2005 .

[21]  Rajeev Motwani,et al.  Load Shedding Techniques for Data Stream Systems , 2003 .

[22]  Walid G. Aref,et al.  Query Indexing and Velocity Constrained Indexing: Scalable Techniques for Continuous Queries on Moving Objects , 2002, IEEE Trans. Computers.

[23]  Elke A. Rundensteiner,et al.  Run-time operator state spilling for memory intensive long-running queries , 2006, SIGMOD Conference.

[24]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[25]  Nesime Tatbul OoS-Driven Load Shedding on Data Streams , 2002, EDBT PhD Workshop.

[26]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[27]  Walid G. Aref,et al.  SINA: scalable incremental processing of continuous queries in spatio-temporal databases , 2004, SIGMOD '04.

[28]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[29]  Walid G. Aref,et al.  SEA-CNN: scalable processing of continuous k-nearest neighbor queries in spatio-temporal databases , 2005, 21st International Conference on Data Engineering (ICDE'05).

[30]  Walid G. Aref,et al.  Towards scalable location-aware services: requirements and research issues , 2003, GIS '03.

[31]  Walid G. Aref,et al.  SOLE: scalable on-line execution of continuous queries on spatio-temporal data streams , 2008, The VLDB Journal.

[32]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[33]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[34]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[35]  Frederick Reiss,et al.  Data Triage: an adaptive architecture for load shedding in TelegraphCQ , 2005, 21st International Conference on Data Engineering (ICDE'05).

[36]  Elke A. Rundensteiner,et al.  SCUBA: Scalable Cluster-Based Algorithm for Evaluating Continuous Spatio-temporal Queries on Moving Objects , 2006, EDBT.