An effective coreset compression algorithm for large scale sensor networks

The wide availability of networked sensors such as GPS and cameras is enabling the creation of sensor networks that generate huge amounts of data. For example, vehicular sensor networks where in-car GPS sensor probes are used to model and monitor traffic can generate on the order of gigabytes of data in real time. How can we compress streaming high-frequency data from distributed sensors? In this paper we construct coresets for streaming motion. The coreset of a data set is a small set which approximately represents the original data. Running queries or fitting models on the core-set will yield similar results when applied to the original data set. We present an algorithm for computing a small coreset of a large sensor data set. Surprisingly, the size of the coreset is independent of the size of the original data set. Combining map-and-reduce techniques with our coreset yields a system capable of compressing in parallel a stream of O(n) points using space and update time that is only O(log n). We provide experimental results and compare the algorithm to the popular Douglas-Peucker heuristic for compressing GPS data.

[1]  Dan Feldman,et al.  From High Definition Image to Low Space Optimization , 2011, SSVM.

[2]  John D. Hobby,et al.  Polygonal approximations that minimize the number of inflections , 1993, SODA '93.

[3]  Matthias Grossglauser,et al.  CRAWDAD dataset epfl/mobility (v.2009-02-24) , 2009 .

[4]  Andreas Krause,et al.  Scalable Training of Mixture Models via Coresets , 2011, NIPS.

[5]  Pankaj K. Agarwal,et al.  Approximation Algorithms for k-Line Center , 2002, ESA.

[6]  Nirvana Meratnia,et al.  Spatiotemporal Compression Techniques for Moving Point Objects , 2004, EDBT.

[7]  Sean Owen,et al.  Mahout in Action , 2011 .

[8]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[9]  John Hershberger,et al.  An O(nlogn) implementation of the Douglas-Peucker algorithm for line simplification , 1994, SCG '94.

[10]  Sariel Har-Peled,et al.  Coresets for Discrete Integration and Clustering , 2006, FSTTCS.

[11]  Pankaj K. Agarwal,et al.  Efficient Algorithms for Approximating Polygonal Chains , 2000, Discret. Comput. Geom..

[12]  Ralf Hartmut Güting,et al.  A data model and data structures for moving objects databases , 2000, SIGMOD 2000.

[13]  N. Megiddo,et al.  Finding Least-Distances Lines , 1983 .

[14]  Subhash Suri,et al.  Catching elephants with mice: Sparse sampling for monitoring sensor networks , 2009, TOSN.

[15]  Ralf Hartmut Güting,et al.  A data model and data structures for moving objects databases , 2000, SIGMOD '00.

[16]  Alain Biem,et al.  IBM infosphere streams for scalable, real-time, intelligent transportation services , 2010, SIGMOD Conference.

[17]  Zack J. Butler,et al.  Tracking a moving object with a binary sensor network , 2003, SenSys '03.

[18]  John Hershberger,et al.  Cartographic line simplification and polygon CSG formulæ in O(nlog * n) time , 1998, Comput. Geom..

[19]  Samuel Madden,et al.  TrajStore: An adaptive storage system for very large trajectory data sets , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[20]  Ouri Wolfson,et al.  Spatio-temporal data reduction with deterministic error bounds , 2003, DIALM-POMC '03.

[21]  David H. Douglas,et al.  ALGORITHMS FOR THE REDUCTION OF THE NUMBER OF POINTS REQUIRED TO REPRESENT A DIGITIZED LINE OR ITS CARICATURE , 1973 .

[22]  Mark de Berg,et al.  Streaming Algorithms for Line Simplification , 2007, SCG '07.

[23]  Thomas K. Peucker,et al.  2. Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line or its Caricature , 2011 .

[24]  Kirk Martinez,et al.  Environmental Sensor Networks: A revolution in the earth system science? , 2006 .

[25]  Mark de Berg,et al.  Streaming Algorithms for Line Simplification , 2010, Discret. Comput. Geom..

[26]  Deborah Estrin,et al.  Using mobile phones to determine transportation modes , 2010, TOSN.

[27]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.