Variational Inference and Learning of Piecewise-linear Dynamical Systems

Modeling the temporal behavior of data is of primordial importance in many scientific and engineering fields. The baseline method assumes that both the dynamic and observation models follow linear-Gaussian models. Non-linear extensions lead to intractable solvers. It is also possible to consider several linear models, or a piecewise linear model, and to combine them with a switching mechanism, which is also intractable because of the exponential explosion of the number of Gaussian components. In this paper, we propose a variational approximation of piecewise linear dynamic systems. We provide full details of the derivation of a variational expectation-maximization algorithm that can be used either as a filter or as a smoother. We show that the model parameters can be split into two sets, a set of static (or observation parameters) and a set of dynamic parameters. The immediate consequences are that the former set can be estimated off-line and that the number of linear models (or the number of states of the switching variable) can be learned based on model selection. We apply the proposed method to the problem of visual tracking and we thoroughly compare our algorithm with several visual trackers applied to the problem of head-pose estimation.

[1]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[2]  H. Abdi Partial Least Square Regression PLS-Regression , 2007 .

[3]  Hedvig Kjellström,et al.  Advances in Variational Inference , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Gwenn Englebienne,et al.  Mixture of Switching Linear Dynamics to Discover Behavior Patterns in Object Tracks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Y. Bar-Shalom Tracking and data association , 1988 .

[6]  Yaakov Bar-Shalom,et al.  Estimation and Tracking: Principles, Techniques, and Software , 1993 .

[7]  Mark J. F. Gales,et al.  Rao-Blackwellised Gibbs sampling for switching linear dynamical systems , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Michael J. Black,et al.  Modeling and decoding motor cortical activity using a switching Kalman filter , 2004, IEEE Transactions on Biomedical Engineering.

[9]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[10]  Florence Forbes,et al.  Prediction with high dimensional regression via hierarchically structured Gaussian mixtures and latent variables , 2019, Journal of the Royal Statistical Society: Series C (Applied Statistics).

[11]  Radu Horaud,et al.  Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions , 2016, IEEE Transactions on Image Processing.

[12]  Radu Horaud,et al.  A Comprehensive Analysis of Deep Regression , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jeffrey K. Uhlmann,et al.  Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.

[14]  Radu Horaud,et al.  Switching Linear Inverse-Regression Model for Tracking Head Pose , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Jean-Marc Odobez,et al.  THE VERNISSAGE CORPUS: A MULTIMODAL HUMAN-ROBOT-INTERACTION DATASET , 2012 .

[16]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[17]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[18]  David Barber,et al.  Switching Linear Dynamical Systems for Noise Robust Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[20]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Jalil Taghia,et al.  Insights Into Multiple/Single Lower Bound Approximation for Extended Variational Inference in Non-Gaussian Structured Data Modeling , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[22]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[23]  Honggang Zhang,et al.  Variational Bayesian Matrix Factorization for Bounded Support Data , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[25]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[26]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[27]  Radu Horaud,et al.  Hyper-Spectral Image Analysis With Partially Latent Regression and Spatial Markov Dependencies , 2014, IEEE Journal of Selected Topics in Signal Processing.

[28]  Neil Martin Robertson,et al.  Deep Head Pose: Gaze-Direction Estimation in Multimodal Video , 2015, IEEE Transactions on Multimedia.

[29]  Jin Young Choi,et al.  Variational Inference for 3-D Localization and Tracking of Multiple Targets Using Multiple Cameras , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Jiri Matas,et al.  A 3D Approach to Facial Landmarks: Detection, Refinement, and Tracking , 2014, 2014 22nd International Conference on Pattern Recognition.

[31]  Rafael Muñoz-Salinas,et al.  Deep Mixture of Linear Inverse Regressions Applied to Head-Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Jun S. Liu,et al.  Mixture Kalman filters , 2000 .

[34]  Michael I. Jordan,et al.  Bayesian Nonparametric Inference of Switching Dynamic Linear Models , 2010, IEEE Transactions on Signal Processing.

[35]  In-So Kweon,et al.  Real-Time Head Orientation from a Monocular Camera Using Deep Neural Network , 2014, ACCV.

[36]  Li Deng,et al.  Variational inference and learning for segmental switching state space models of hidden speech dynamics , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[37]  Václav Hlavác,et al.  Detector of Facial Landmarks Learned by the Structured Output SVM , 2012, VISAPP.

[38]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[39]  Radu Horaud,et al.  Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[40]  Jalil Taghia,et al.  Variational Inference for Watson Mixture Model , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jean-Marc Odobez,et al.  EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras , 2014, ETRA.

[42]  Laurent Girin,et al.  Dynamical Variational Autoencoders: A Comprehensive Review , 2020, Found. Trends Mach. Learn..

[43]  Dariu Gavrila,et al.  Context-Based Path Prediction for Targets with Switching Dynamics , 2018, International Journal of Computer Vision.

[44]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[45]  Jean-Marc Odobez,et al.  Gaze estimation from multimodal Kinect data , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[46]  Radu Horaud,et al.  Tracking Multiple Audio Sources With the von Mises Distribution and Variational EM , 2018, IEEE Signal Processing Letters.

[47]  Vladimir Pavlovic,et al.  Variational Learning in Mixed-State Dynamic Graphical Models , 1999, UAI.

[48]  James M. Rehg,et al.  Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems , 2008, International Journal of Computer Vision.

[49]  Markus Flierl,et al.  Bayesian estimation of Dirichlet mixture model with variational inference , 2014, Pattern Recognit..

[50]  David Barber,et al.  Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems , 2006, J. Mach. Learn. Res..

[51]  Paul W. Fieguth,et al.  A multimodal variational approach to learning and inference in switching state space models [speech processing application] , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[52]  Radu Horaud,et al.  An on-line variational Bayesian model for multi-person tracking from cluttered scenes , 2016, Comput. Vis. Image Underst..

[53]  Radu Horaud,et al.  Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[55]  Florian Nadel,et al.  Stochastic Processes And Filtering Theory , 2016 .

[56]  Antoine Deleforge,et al.  Inverse regression approach to robust nonlinear high-to-low dimensional mapping , 2018, J. Multivar. Anal..

[57]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[58]  Jalil Taghia,et al.  Bayesian Estimation of the von-Mises Fisher Mixture Model with Variational Inference , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  C. Striebel,et al.  On the maximum likelihood estimates for linear dynamic systems , 1965 .

[60]  Li Deng,et al.  Switching Dynamic System Models for Speech Articulation and Acoustics , 2004 .

[61]  Radu Horaud,et al.  Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[64]  Radu Horaud,et al.  High-dimensional regression with gaussian mixtures and partially-latent response variables , 2013, Statistics and Computing.

[65]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[66]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.