Error-Bounded Approximation of Data Stream: Methods and Theories

Since the development of sensor network and Internet of Things, the volume of data is rapidly increasing and the streaming data has attracted much attention recently. To efficiently process and explore data streams, the compact data representation is playing an important role, since the data approximations other than the original data items are usually applied in many stream mining tasks, such as clustering, classification, and correlation analysis. In this chapter, we focus on the maximum error-bounded approximation of data stream, which represents the streaming data with constrained approximation error on each data point. There are two criteria for the approximation solution: self-adaption over time for varied error bound and real-time processing. We reviewed the existing data approximation techniques and summarized some essential theories such as optimization guarantee. Two optimal linear-time algorithms are introduced to construct error-bounded piecewise linear representation for data stream. One generates the line segments by data convex analysis, and the other one is based on the transformed space, which can be extended to a general model. We theoretically analyzed and compared these two different spaces, and proved the theoretical equivalence between them, as well as the two algorithms.

[1]  Donghui Zhang,et al.  Online event-driven subsequence matching over financial data streams , 2004, SIGMOD '04.

[2]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[3]  Lida Xu,et al.  The internet of things: a survey , 2014, Information Systems Frontiers.

[4]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[5]  W B Runciman,et al.  Crisis management during anaesthesia: difficult intubation , 2005, Quality and Safety in Health Care.

[6]  Beng Chin Ooi,et al.  Global optimization of histograms , 2001, SIGMOD '01.

[7]  Sudipto Guha,et al.  A Note on Linear Time Algorithms for Maximum Error Histograms , 2007, IEEE Transactions on Knowledge and Data Engineering.

[8]  Jie Liu,et al.  GAMPS: compressing multi sensor data by grouping and amplitude scaling , 2009, SIGMOD Conference.

[9]  Zi Huang,et al.  Quick identification of near-duplicate video sequences with cut signature , 2011, World Wide Web.

[10]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[11]  Michel Verhaegen,et al.  ECG Segmentation Using Time-Warping , 1997, IDA.

[12]  Sen Wang,et al.  Computing Unrestricted Synopses Under Maximum Error Bound , 2011, Algorithmica.

[13]  Jianzhong Li,et al.  Enabling epsilon-Approximate Querying in Sensor Networks , 2009, Proc. VLDB Endow..

[14]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[15]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[16]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[17]  Karl Aberer,et al.  A Survey of Model-based Sensor Data Acquisition and Management , 2013, Managing and Mining Sensor Data.

[18]  Walid G. Aref,et al.  Online Piece-wise Linear Approximation of Numerical Streams with Precision Guarantees , 2009, Proc. VLDB Endow..

[19]  Sharad Mehrotra,et al.  Capturing sensor-generated time series with quality guarantees , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[20]  Jianzhong Li,et al.  ε-Approximation to data streams in sensor networks , 2013, 2013 Proceedings IEEE INFOCOM.

[21]  Qing Xie,et al.  Local correlation detection with linearity enhancement in streaming data , 2013, CIKM.

[22]  Serge Abiteboul,et al.  Monitoring XML data on the Web , 2001, SIGMOD '01.

[23]  Jian Pei,et al.  Fast and quality-guaranteed data streaming in resource-constrained sensor networks , 2008, MobiHoc '08.

[24]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[25]  Sudipto Guha,et al.  Approximation Algorithms for Wavelet Transform Coding of Data Streams , 2006, IEEE Transactions on Information Theory.

[26]  Nisheeth Shrivastava,et al.  Space Efficient Streaming Algorithms for the Maximum Error Histogram , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[27]  Mi Zhou,et al.  A segment-wise time warping method for time scaling searching , 2005, Inf. Sci..

[28]  Chaoyi Pang,et al.  On Multidimensional Wavelet Synopses for Maximum Error Bounds , 2009, DASFAA.

[29]  Dimitrios Gunopulos,et al.  Online amnesic approximation of streaming time series , 2004, Proceedings. 20th International Conference on Data Engineering.

[30]  Joseph O'Rourke,et al.  An on-line algorithm for fitting straight lines between data ranges , 1981, CACM.

[31]  Zi Huang,et al.  Efficient and Continuous Near-duplicate Video Detection , 2010, 2010 12th International Asia-Pacific Web Conference.

[32]  Qing Xie,et al.  Maximum error-bounded Piecewise Linear Representation for online stream approximation , 2014, The VLDB Journal.

[33]  Kotagiri Ramamohanarao,et al.  An adaptive algorithm for online time series segmentation with error bound guarantee , 2012, EDBT '12.

[34]  Subhash Suri,et al.  Space-efficient online approximation of time series data: Streams, amnesia, and out-of-order , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[35]  Yunhao Liu,et al.  Indexable PLA for Efficient Similarity Search , 2007, VLDB.

[36]  Chaoyi Pang,et al.  Unrestricted wavelet synopses under maximum error bound , 2009, EDBT '09.

[37]  Antonio Iera,et al.  The Internet of Things: A survey , 2010, Comput. Networks.