Uncertain Time-Series Similarity: Return to the Basics

In the last years there has been a considerable increase in the availability of continuous sensor measurements in a wide range of application domains, such as Location-Based Services (LBS), medical monitoring systems, manufacturing plants and engineering facilities to ensure efficiency, product quality and safety, hydrologic and geologic observing systems, pollution management, and others. Due to the inherent imprecision of sensor observations, many investigations have recently turned into querying, mining and storing uncertain data. Uncertainty can also be due to data aggregation, privacy-preserving transforms, and error-prone mining algorithms. In this study, we survey the techniques that have been proposed specifically for modeling and processing uncertain time series, an important model for temporal data. We provide an analytical evaluation of the alternatives that have been proposed in the literature, highlighting the advantages and disadvantages of each approach, and further compare these alternatives with two additional techniques that were carefully studied before. We conduct an extensive experimental evaluation with 17 real datasets, and discuss some surprising results, which suggest that a fruitful research direction is to take into account the temporal correlations in the time series. Based on our evaluations, we also provide guidelines useful for the practitioners in the field.

[1]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[2]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[3]  Alok N. Choudhary,et al.  Uncertain Range Queries for Necklaces , 2010, 2010 Eleventh International Conference on Mobile Data Management.

[4]  Charu C. Aggarwal On Unifying Privacy and Uncertain Data Models , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[6]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[7]  David B. Lomet,et al.  Foundations of Data Organization and Algorithms , 1993, Lecture Notes in Computer Science.

[8]  Yang-Sae Moon,et al.  Duality-based subsequence matching in time-series databases , 2001, Proceedings 17th International Conference on Data Engineering.

[9]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[10]  Deborah Estrin,et al.  New Approaches in Embedded Networked Sensing for Terrestrial Ecological Observatories , 2007 .

[11]  Hans-Peter Kriegel,et al.  Probabilistic Similarity Search for Uncertain Time Series , 2009, SSDBM.

[12]  Philip S. Yu,et al.  PROUD: a probabilistic approach to processing similarity queries over uncertain data streams , 2009, EDBT '09.

[13]  Smruti R. Sarangi,et al.  DUST: a generalized notion of similarity between uncertain time series , 2010, KDD.

[14]  Amy L. Murphy,et al.  What does model-driven data acquisition really achieve in wireless sensor networks? , 2012, 2012 IEEE International Conference on Pervasive Computing and Communications.

[15]  Dan Suciu,et al.  Embracing Uncertainty in Large-Scale Computational Astrophysics. , 2009, MUD 2009.

[16]  Anna Liu,et al.  PODS: a new model and processing algorithms for uncertain data streams , 2010, SIGMOD Conference.

[17]  Amy L. Murphy,et al.  Is there light at the ends of the tunnel? Wireless sensor networks for adaptive lighting in road tunnels , 2011, Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks.

[18]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[19]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[20]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[21]  Michael Stonebraker,et al.  Requirements for Science Data Bases and SciDB , 2009, CIDR.

[22]  Gang Chen,et al.  Top-k Similarity Search on Uncertain Trajectories , 2011, SSDBM.

[23]  Mark D. Yarvis,et al.  Design and deployment of industrial sensor networks: experiences from a semiconductor plant and the north sea , 2005, SenSys '05.

[24]  Xiang Lian,et al.  Efficient join processing on uncertain data streams , 2009, CIKM.

[25]  Philip S. Yu,et al.  Time Series Compressibility and Privacy , 2007, VLDB.

[26]  Philip S. Yu,et al.  On wavelet decomposition of uncertain time series data sets , 2010, CIKM.

[27]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[28]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[29]  Yang-Sae Moon,et al.  General match: a subsequence matching method in time-series databases based on generalized windows , 2002, SIGMOD '02.

[30]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.