Wasserstein Stationary Subspace Analysis

Learning under nonstationarity can be achieved by decomposing the data into a subspace that is stationary and a nonstationary one [stationary subspace analysis (SSA)]. While SSA has been used in various applications, its robustness and computational efficiency have limits due to the difficulty in optimizing the Kullback-Leibler divergence based objective. In this paper, we contribute by extending SSA twofold: we propose SSA with 1) higher numerical efficiency by defining analytical SSA variants and 2) higher robustness by utilizing the Wasserstein-2 distance (Wasserstein SSA). We show the usefulness of our novel algorithms for toy data demonstrating their mathematical properties and for real-world data 1) allowing better segmentation of time series and 2) brain–computer interfacing, where the Wasserstein-based measure of nonstationarity is used for spatial filter regularization and gives rise to higher decoding performance.

[1]  Franz J. Király,et al.  The Stationary Subspace Analysis Toolbox , 2011, J. Mach. Learn. Res..

[2]  J. A. Cuesta-Albertos,et al.  A fixed-point approach to barycenters in Wasserstein space , 2015, 1511.05355.

[3]  I. Dryden,et al.  Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging , 2009, 0910.1656.

[4]  Motoaki Kawanabe,et al.  Stationary common spatial patterns for brain–computer interfacing , 2012, Journal of neural engineering.

[5]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .

[6]  Pablo Laguna,et al.  Principal Component Analysis in ECG Signal Processing , 2007, EURASIP J. Adv. Signal Process..

[7]  Motoaki Kawanabe,et al.  Improving Classification Performance of BCIs by Using Stationary Common Spatial Patterns and Unsupervised Bias Adaptation , 2011, HAIS.

[8]  R. Bhatia,et al.  On the Bures–Wasserstein distance between positive definite matrices , 2017, Expositiones Mathematicae.

[9]  Yoshinobu Kawahara,et al.  Separation of stationary and non-stationary sources with a generalized eigenvalue problem , 2012, Neural Networks.

[10]  K. Müller,et al.  Finding stationary subspaces in multivariate time series. , 2009, Physical review letters.

[11]  Motoaki Kawanabe,et al.  Brain-computer interfacing in discriminative and stationary subspaces , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[12]  Sim Heng Ong,et al.  Discriminative Learning of Propagation and Spatial Pattern for Motor Imagery EEG Analysis , 2013, Neural Computation.

[13]  M. Kawanabe,et al.  Higher order stationary subspace analysis , 2016 .

[14]  Ping-Keng Jao,et al.  Improving Cross-Day EEG-Based Emotion Classification Using Robust Principal Component Analysis , 2017, Front. Comput. Neurosci..

[15]  Masashi Sugiyama,et al.  Geometry-aware stationary subspace analysis , 2016, ACML.

[16]  Robert Savit,et al.  Stationarity and nonstationarity in time series analysis , 1996 .

[17]  Motoaki Kawanabe,et al.  On robust parameter estimation in brain–computer interfacing , 2017, Journal of neural engineering.

[18]  M. Knott,et al.  On the optimal mapping of distributions , 1984 .

[19]  Mo Chen,et al.  Low-rank representation of neural activity and detection of submovements , 2013, 52nd IEEE Conference on Decision and Control.

[20]  Sergio Cruces,et al.  Log-Determinant Divergences Revisited: Alpha-Beta and Gamma Log-Det Divergences , 2014, Entropy.

[21]  C. Villani Topics in Optimal Transportation , 2003 .

[22]  Franz J. Király,et al.  Algebraic Geometric Comparison of Probability Distributions , 2012, J. Mach. Learn. Res..

[23]  Yoav Zemel,et al.  Procrustes Metrics on Covariance Operators and Optimal Transportation of Gaussian Processes , 2018, Sankhya A.

[24]  Brian C. Lovell,et al.  Non-Linear Stationary Subspace Analysis with Application to Video Classification , 2013, ICML.

[25]  K.-R. Muller,et al.  Optimizing Spatial filters for Robust EEG Single-Trial Analysis , 2008, IEEE Signal Processing Magazine.

[26]  Motoaki Kawanabe,et al.  Stationary Common Spatial Patterns: Towards robust classification of non-stationary EEG signals , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Piercesare Secchi,et al.  Distances and inference for covariance operators , 2014 .

[28]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[29]  Motoaki Kawanabe,et al.  Divergence-Based Framework for Common Spatial Patterns Algorithms , 2014, IEEE Reviews in Biomedical Engineering.

[30]  Motoaki Kawanabe,et al.  In Search of Non-Gaussian Components of a High-Dimensional Distribution , 2006, J. Mach. Learn. Res..

[31]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.

[32]  J. Gower Generalized procrustes analysis , 1975 .

[33]  Nicholas Ayache,et al.  Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices , 2007, SIAM J. Matrix Anal. Appl..

[34]  Asuka Takatsu Wasserstein geometry of Gaussian measures , 2011 .

[35]  Motoaki Kawanabe,et al.  Robust common spatial filters with a maxmin approach , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[36]  D. Kendall A Survey of the Statistical Theory of Shape , 1989 .

[37]  Y. Zemel Fréchet means in Wasserstein space , 2017 .

[38]  Cuntai Guan,et al.  Regularizing Common Spatial Patterns to Improve BCI Designs: Unified Theory and New Algorithms , 2011, IEEE Transactions on Biomedical Engineering.

[39]  Klaus-Robert Müller,et al.  Feature Extraction for Change-Point Detection Using Stationary Subspace Analysis , 2011, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Klaus-Robert Müller,et al.  Neurophysiological predictor of SMR-based BCI performance , 2010, NeuroImage.

[41]  W. Samek,et al.  Group-wise Stationary Subspace Analysis-A novel method for studying non-stationarities , 2011 .

[42]  Sergio Cruces,et al.  Optimization of Alpha-Beta Log-Det Divergences and their Application in the Spatial Filtering of Two Class Motor Imagery Movements , 2017, Entropy.

[43]  Jean-Michel Loubes,et al.  A Gaussian Process Regression Model for Distribution Inputs , 2017, IEEE Transactions on Information Theory.

[44]  D. Dowson,et al.  The Fréchet distance between multivariate normal distributions , 1982 .

[45]  Aasa Feragen,et al.  Learning from uncertain curves: The 2-Wasserstein metric for Gaussian processes , 2017, NIPS.

[46]  Klaus-Robert Müller,et al.  Wasserstein Training of Restricted Boltzmann Machines , 2016, NIPS.

[47]  Terrence J. Sejnowski,et al.  Toward Brain-Computer Interfacing (Neural Information Processing) , 2007 .

[48]  Cuntai Guan,et al.  Optimizing Spatial Filters by Minimizing Within-Class Dissimilarities in Electroencephalogram-Based Brain–Computer Interface , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[49]  Motoaki Kawanabe,et al.  Robust Spatial Filtering with Beta Divergence , 2013, NIPS.

[50]  Motoaki Kawanabe,et al.  An Information Geometrical View of Stationary Subspace Analysis , 2011, ICANN.

[51]  Klaus-Robert Muller,et al.  Finding stationary brain sources in EEG data , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.