E-Embed: A time series visualization framework based on earth mover's distance

Abstract Time series analysis is an important topic in machine learning and a suitable visualization method can be used to facilitate the work of data mining. In this paper, we propose E-Embed: a novel framework to visualize time series data by projecting them into a low-dimensional space while capturing the underlying data structure. In the E-Embed framework, we use discrete distributions to model time series and measure the distances between them by using earth mover’s distance (EMD). After the distances between time series are calculated, we can visualize the data by dimensionality reduction algorithms. To combine different dimensionality reduction methods (such as Isomap) that depend on K-nearest neighbor (KNN) graph effectively, we propose an algorithm for constructing a KNN graph based on the earth mover’s distance. We evaluate our visualization framework on both univariate time series data and multivariate time series data. Experimental results demonstrate that E-Embed can provide high quality visualization with low computational cost.

[1]  Joachim M. Buhmann,et al.  Multidimensional Scaling and Data Clustering , 1994, NIPS.

[2]  Marie desJardins,et al.  Visualizing Multivariate Time Series Data to Detect Specific Medical Conditions , 2008, AMIA.

[3]  Alfred Inselberg,et al.  The plane with parallel coordinates , 1985, The Visual Computer.

[4]  Peter J. Bickel,et al.  The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[5]  John V. Carlis,et al.  Interactive visualization of serial periodic data , 1998, UIST '98.

[6]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[7]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Ambuj K. Singh,et al.  Indexing the Earth Mover's Distance Using Normal Distributions , 2011, Proc. VLDB Endow..

[9]  Kyriakos Mouratidis,et al.  Optimal matching between spatial datasets under capacity constraints , 2010, TODS.

[10]  Patrick Riehmann,et al.  Time‐Series Plots Integrated in Parallel‐Coordinates Displays , 2016, Comput. Graph. Forum.

[11]  Chen Xiang-tao Summaly of application research based on clustering of time series similarity , 2010 .

[12]  W. Verstraeten,et al.  A comparison of time series similarity measures for classification and change detection of ecosystem dynamics , 2011 .

[13]  Marco Cuturi Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances , 2013, 1306.0895.

[14]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[15]  Arindam Banerjee,et al.  Bregman Alternating Direction Method of Multipliers , 2013, NIPS.

[16]  Xiangxu Meng,et al.  The Polar Parallel Coordinates Method for Time-Series Data Visualization , 2012, 2012 Fourth International Conference on Computational and Information Sciences.

[17]  M. Muskulus,et al.  Wasserstein distances in the analysis of time series and dynamical systems , 2011 .

[18]  L. Guibas,et al.  The Earth Mover''s Distance: Lower Bounds and Invariance under Translation , 1997 .

[19]  H. Abdi,et al.  Principal component analysis , 2010 .

[20]  Jingzhou Liu,et al.  Visualizing Large-scale and High-dimensional Data , 2016, WWW.

[21]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[22]  Reynold Cheng,et al.  Earth Mover's Distance based Similarity Search at Scale , 2013, Proc. VLDB Endow..

[23]  Petia Radeva,et al.  Personalization and user verification in wearable systems using biometric walking patterns , 2011, Personal and Ubiquitous Computing.

[24]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[25]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[26]  K Lehnertz,et al.  Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  James Ze Wang,et al.  Real-time computerized annotation of pictures. , 2008, IEEE transactions on pattern analysis and machine intelligence.

[28]  Mukund Balasubramanian,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[29]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .