The Wasserstein-Fourier Distance for Stationary Time Series

We introduce a novel framework for analysing stationary time series based on optimal transport distances and spectral embeddings. First, we represent time series by their power spectral density (PSD), which summarises the signal energy spread across the Fourier spectrum. Second, we endow the space of PSDs with the Wasserstein distance, which capitalises its unique ability to preserve the geometric information of a set of distributions. These two steps enable us to define the Wasserstein-Fourier (WF) distance, which allows us to compare stationary time series even when they differ in sampling rate, length, magnitude and phase. We analyse the features of WF by blending the properties of the Wasserstein distance and those of the Fourier transform. The proposed WF distance is then used in three sets of key time series applications considering real-world datasets: (i) interpolation of time series leading to data augmentation, (ii) dimensionality reduction via non-linear PCA, and (iii) parametric and non-parametric classification tasks. Our conceptual and experimental findings validate the general concept of using divergences of distributions, especially the Wasserstein distance, to analyse time series through comparing their spectral representations.

[1]  J. Dauxois,et al.  Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference , 1982 .

[2]  Petre Stoica,et al.  Spectral analysis of nonuniformly sampled data - a review , 2010, Digit. Signal Process..

[3]  Gilbert T. Walker,et al.  On Periodicity in Series of Related Terms , 1931 .

[4]  M. Basseville Distance measures for signal processing and pattern recognition , 1989 .

[5]  Laura J. Grundy,et al.  A dictionary of behavioral motifs reveals clusters of genes affecting Caenorhabditis elegans locomotion , 2012, Proceedings of the National Academy of Sciences.

[6]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[7]  C. Villani Topics in Optimal Transportation , 2003 .

[8]  Justin Salamon,et al.  A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.

[9]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[10]  Alexandr Andoni,et al.  Approximate Nearest Neighbor Search in High Dimensions , 2018, Proceedings of the International Congress of Mathematicians (ICM 2018).

[11]  Marco Cuturi,et al.  Soft-DTW: a Differentiable Loss Function for Time-Series , 2017, ICML.

[12]  Tom Fearn Gaussian Process Regression , 2013 .

[13]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[14]  Felipe A. Tobar,et al.  Bayesian Learning with Wasserstein Barycenters , 2018, ESAIM: Probability and Statistics.

[15]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[16]  Jérémie Bigot,et al.  Geodesic PCA in the Wasserstein space by Convex PCA , 2017 .

[17]  Arthur Schuster,et al.  On the investigation of hidden periodicities with application to a supposed 26 day period of meteorological phenomena , 1898 .

[18]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[19]  Makarand Tapaswi,et al.  A Closed-form Gradient for the 1 D Earth Mover ’ s Distance for Spectral Deep Learning on Biological Data , 2016 .

[20]  Chris Chatfield,et al.  The Analysis of Time Series: An Introduction , 1981 .

[21]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[22]  Eamonn J. Keogh,et al.  Everything you know about Dynamic Time Warping is Wrong , 2004 .

[23]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[24]  Felipe A. Tobar,et al.  Spectral Mixture Kernels for Multi-Output Gaussian Processes , 2017, NIPS.

[25]  Nicolas Papadakis,et al.  Geodesic PCA versus Log-PCA of Histograms in the Wasserstein Space , 2018, SIAM J. Sci. Comput..

[26]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[27]  Mikkel Baun Kjærgaard,et al.  Smart Devices are Different: Assessing and MitigatingMobile Sensing Heterogeneities for Activity Recognition , 2015, SenSys.

[28]  Felipe A. Tobar,et al.  Band-Limited Gaussian Processes: The Sinc Kernel , 2019, NeurIPS.

[29]  Arno Solin,et al.  Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[30]  Nicolas Courty,et al.  Wasserstein Distance Measure Machines , 2018, ArXiv.

[31]  C. Villani Optimal Transport: Old and New , 2008 .

[32]  Richard E. Turner,et al.  Learning Stationary Time Series using Gaussian Processes with Nonparametric Kernels , 2015, NIPS.

[33]  Lawrence Carin,et al.  GP Kernels for Cross-Spectrum Analysis , 2015, NIPS.

[34]  Yoav Zemel,et al.  Procrustes Metrics on Covariance Operators and Optimal Transportation of Gaussian Processes , 2018, Sankhya A.

[35]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[36]  Asuka Takatsu Wasserstein geometry of Gaussian measures , 2011 .

[37]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[38]  Justin Solomon,et al.  Audio Transport: A Generalized Portamento via Optimal Transport , 2019, ArXiv.

[39]  Nicolas Courty,et al.  Distance Measure Machines , 2018, 1803.00250.

[40]  Felipe A. Tobar,et al.  Bayesian Nonparametric Spectral Estimation , 2018, NeurIPS.

[41]  H. Muller,et al.  Functional data analysis for density functions by transformation to a Hilbert space , 2016, 1601.02869.

[42]  Dongmei Li,et al.  Dynamic time warping assessment of high-resolution melt curves provides a robust metric for fungal identification , 2017, PloS one.

[43]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[44]  Steven Kay,et al.  Modern Spectral Estimation: Theory and Application , 1988 .

[45]  Nicolas Courty,et al.  Optimal spectral transportation with application to music transcription , 2016, NIPS.

[46]  G. Yule On a Method of Investigating Periodicities in Disturbed Series, with Special Reference to Wolfer's Sunspot Numbers , 1927 .

[47]  Hiroshi Sawada,et al.  Blind source separation with optimal transport non-negative matrix factorization , 2018, EURASIP Journal on Advances in Signal Processing.

[48]  James Zijun Wang,et al.  Fast Discrete Distribution Clustering Using Wasserstein Barycenter With Sparse Support , 2015, IEEE Transactions on Signal Processing.

[49]  Felipe Tobar,et al.  Low-pass Filtering as Bayesian Inference , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  P. Thomas Fletcher,et al.  Principal geodesic analysis for the study of nonlinear statistics of shape , 2004, IEEE Transactions on Medical Imaging.

[51]  S. Ghosal,et al.  Bayesian Estimation of the Spectral Density of a Time Series , 2004 .

[52]  Richard E. Turner,et al.  Time-Frequency Analysis as Probabilistic Inference , 2014, IEEE Transactions on Signal Processing.

[53]  François-Xavier Vialard,et al.  An Interpolating Distance Between Optimal Transport and Fisher–Rao Metrics , 2010, Foundations of Computational Mathematics.

[54]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[55]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[56]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.