An effective coreset compression algorithm for large scale sensor networks

The wide availability of networked sensors such as GPS and cameras is enabling the creation sensor networks that generate huge amounts of data. For example, vehicular sensor networks where in-car GPS sensor probes are used to model and monitor traffic can generate on the order of giga-bytes of data in real time. How can we compress streaming highfrequency data from distributed sensors? In this paper we construct coresets for streaming motion. The coreset of a data set is a small set which approximately represents the original data. Running queries or fitting models on the coreset will yield a similar result when applied to the original data set. We present an algorithm for computing a small coreset of a large sensor data set. Surprisingly, the size of the coreset is independent of the size of the original data set. Combining map-and-reduce techniques with our coreset yields a system capable of compressing in parallel a stream of O(n) points using space and update time that is only O(log n). We provide experimental results and compare the algorithm to the popular Douglas-Peucker heuristic for compressing GPS data.

[1]  Subhash Suri,et al.  Catching elephants with mice: Sparse sampling for monitoring sensor networks , 2009, TOSN.

[2]  Samuel Madden,et al.  TrajStore: An adaptive storage system for very large trajectory data sets , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[3]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[4]  Ouri Wolfson,et al.  Spatio-temporal data reduction with deterministic error bounds , 2003, DIALM-POMC.

[5]  Pankaj K. Agarwal,et al.  Efficient Algorithms for Approximating Polygonal Chains , 2000, Discret. Comput. Geom..

[6]  Alain Biem,et al.  IBM infosphere streams for scalable, real-time, intelligent transportation services , 2010, SIGMOD Conference.

[7]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[8]  John D. Hobby,et al.  Polygonal approximations that minimize the number of inflections , 1993, SODA '93.

[9]  Deborah Estrin,et al.  Using mobile phones to determine transportation modes , 2010, TOSN.

[10]  Pankaj K. Agarwal,et al.  Approximation Algorithms for k-Line Center , 2002, ESA.

[11]  Sean Owen,et al.  Mahout in Action , 2011 .

[12]  Nirvana Meratnia,et al.  Spatiotemporal Compression Techniques for Moving Point Objects , 2004, EDBT.

[13]  John Hershberger,et al.  Cartographic line simplification and polygon CSG formulæ in O(nlog * n) time , 1998, Comput. Geom..

[14]  Zack J. Butler,et al.  Tracking a moving object with a binary sensor network , 2003, SenSys '03.

[15]  Ralf Hartmut Güting,et al.  A data model and data structures for moving objects databases , 2000, SIGMOD '00.

[16]  John Hershberger,et al.  An O(nlogn) implementation of the Douglas-Peucker algorithm for line simplification , 1994, SCG '94.

[17]  David H. Douglas,et al.  ALGORITHMS FOR THE REDUCTION OF THE NUMBER OF POINTS REQUIRED TO REPRESENT A DIGITIZED LINE OR ITS CARICATURE , 1973 .

[18]  Matthias Grossglauser,et al.  CRAWDAD dataset epfl/mobility (v.2009-02-24) , 2009 .

[19]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[20]  Kirk Martinez,et al.  Environmental Sensor Networks: A revolution in the earth system science? , 2006 .

[21]  Sariel Har-Peled,et al.  Coresets for Discrete Integration and Clustering , 2006, FSTTCS.