Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes

We propose a family of kernels based on the Binet-Cauchy theorem, and its extension to Fredholm operators. Our derivation provides a unifying framework for all kernels on dynamical systems currently used in machine learning, including kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. In the case of linear time-invariant systems, we derive explicit formulae for computing the proposed Binet-Cauchy kernels by solving Sylvester equations, and relate the proposed kernels to existing kernels based on cepstrum coefficients and subspace angles. We show efficient methods for computing our kernels which make them viable for the practitioner.Besides their theoretical appeal, these kernels can be used efficiently in the comparison of video sequences of dynamic scenes that can be modeled as the output of a linear time-invariant dynamical system. One advantage of our kernels is that they take the initial conditions of the dynamical systems into account. As a first example, we use our kernels to compare video sequences of dynamic textures. As a second example, we apply our kernels to the problem of clustering short clips of a movie. Experimental evidence shows superior performance of our kernels.

[1]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[2]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[4]  Mehryar Mohri,et al.  Rational Kernels , 2002, NIPS.

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[9]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[10]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[11]  Bernhard Schölkopf,et al.  Comparison of View-Based Object Recognition Algorithms Using Realistic 3D Models , 1996, ICANN.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[15]  P. Bartlett,et al.  Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms , 1999 .

[16]  Alan J. Laub,et al.  Solution of the Sylvester matrix equation AXBT + CXDT = E , 1992, TOMS.

[17]  Francesca Odone,et al.  Hausdorff Kernel for 3D Object Acquisition and Detection , 2002, ECCV.

[18]  Richard J. Martin A metric for ARMA processes , 2000, IEEE Trans. Signal Process..

[19]  Aitken. A.c Determinants And Matrices , 1944 .

[20]  Liva Ralaivola,et al.  Dynamical Modeling with Kernels for Nonlinear Time Series Prediction , 2003, NIPS.

[21]  Bart De Moor,et al.  Subspace angles between ARMA models , 2002, Syst. Control. Lett..

[22]  H. Kashima,et al.  Kernels for graphs , 2004 .

[23]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[24]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[26]  Bernhard Schölkopf,et al.  A New Method for Constructing Artificial Neural Networks , 1995 .

[27]  F. Fairman Introduction to dynamic systems: Theory, models and applications , 1979, Proceedings of the IEEE.

[28]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[29]  Jan C. Willems,et al.  From time series to linear system - Part III: Approximate modelling , 1987, Autom..

[30]  Rama Chellappa,et al.  A system identification approach for video-based face recognition , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[31]  L. M. M.-T. Theory of Probability , 1929, Nature.

[32]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[33]  Vapnik,et al.  SVMs for Histogram Based Image Classification , 1999 .

[34]  A. Isidori Nonlinear Control Systems , 1985 .

[35]  Lior Wolf,et al.  Learning over Sets using Kernel Principal Angles , 2003, J. Mach. Learn. Res..

[36]  Gene H. Golub,et al.  Matrix computations , 1983 .

[37]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[38]  A. Pinkus Spectral Properties of Totally Positive Kernels and Matrices , 1996 .

[39]  A. C. Aitken,et al.  Determinants and matrices , 1940 .

[40]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[41]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[42]  Tomaso A. Poggio,et al.  Face recognition with support vector machines: global versus component-based approach , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[43]  Jan C. Willems,et al.  From time series to linear system - Part II. Exact modelling , 1986, Autom..

[44]  D. V. Gokhale,et al.  Theory of Probability, Vol. I , 1975 .

[45]  Sundar Vishwanathan,et al.  Kernel Methods Fast Algorithms and real life applications , 2003 .

[46]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[47]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[49]  H. König Eigenvalue Distribution of Compact Operators , 1986 .

[50]  Bernhard Schölkopf,et al.  Learning to Find Pre-Images , 2003, NIPS.

[51]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[52]  Jan C. Willems,et al.  From time series to linear system - Part I. Finite dimensional linear time invariant systems , 1986, Autom..

[53]  B. M. Hill,et al.  Theory of Probability , 1990 .

[54]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[55]  Alexander J. Smola,et al.  The kernel mutual information , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[56]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[57]  S. Vishwanathan,et al.  Hilbert space embeddings in dynamical systems , 2003 .

[58]  Tamir Hazan,et al.  Algebraic Set Kernels with Application to Inference Over Local Image Representations , 2004, NIPS.

[59]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[60]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[61]  J. Baxter,et al.  Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[62]  Golub Gene H. Et.Al Matrix Computations, 3rd Edition , 2007 .

[63]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[64]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.