The Mixture of Multi-kernel Relevance Vector Machines Model

We present a new regression mixture model where each mixture component is a multi-kernel version of the Relevance Vector Machine (RVM). In the proposed model, we exploit the enhanced modeling capability of RVMs due to their embedded sparsity enforcing properties. %The main contribution of this %work is the employment of RVM models as components of a mixture %model and their application to the time series clustering problem. Moreover, robustness is achieved with respect to the kernel parameters, by employing a weighted multi-kernel scheme. The mixture model is trained using the maximum a posteriori (MAP) approach, where the Expectation Maximization (EM) algorithm is applied offering closed form update equations for the model parameters. An incremental learning methodology is also presented to tackle the parameter initialization problem of the EM algorithm. The efficiency of the proposed mixture model is empirically demonstrated on the time series clustering problem using various artificial and real benchmark datasets and by performing comparisons with other regression mixture models.

[1]  Dit-Yan Yeung,et al.  Mixtures of ARMA models for model-based time series clustering , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[4]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[5]  B. Wang,et al.  Curve prediction and clustering with mixtures of Gaussian process functional regression models , 2008, Stat. Comput..

[6]  Marc Toussaint,et al.  Modelling motion primitives and their timing in biologically executed movements , 2007, NIPS.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[9]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[10]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[11]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[12]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[13]  Andrew Blake,et al.  Sparse Bayesian learning for efficient visual tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Vladimir Pavlovic,et al.  Discovering clusters in motion time-series data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[15]  Padhraic Smyth,et al.  Curve Clustering with Random Effects Regression Mixtures , 2003, AISTATS.

[16]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[17]  Andrew R. Barron,et al.  Mixture Density Estimation , 1999, NIPS.

[18]  Yiqiang Chen,et al.  Building Sparse Multiple-Kernel SVM Classifiers , 2009, IEEE Transactions on Neural Networks.

[19]  P. Deb Finite Mixture Models , 2008 .

[20]  Richard M. Everson,et al.  Smooth relevance vector machine: a smoothness prior extension of the RVM , 2007, Machine Learning.

[21]  Nikolas P. Galatsanos,et al.  A Regression Mixture Model with Spatial Constraints for Clustering Spatiotemporal Data , 2008, Int. J. Artif. Intell. Tools.

[22]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[23]  Steve R. Gunn,et al.  Structural Modelling with Sparse Kernels , 2002, Machine Learning.

[24]  Geoffrey E. Hinton,et al.  SMEM Algorithm for Mixture Models , 1998, Neural Computation.

[25]  Mingjun Zhong A variational method for learning sparse Bayesian regression , 2006, Neurocomputing.

[26]  Jean-Philippe Thiran,et al.  Counting Pedestrians in Video Sequences Using Trajectory Clustering , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Simon Rogers,et al.  Hierarchic Bayesian models for kernel learning , 2005, ICML.

[28]  W. DeSarbo,et al.  A maximum likelihood methodology for clusterwise linear regression , 1988 .

[29]  Nikos A. Vlassis,et al.  A Greedy EM Algorithm for Gaussian Mixture Learning , 2002, Neural Processing Letters.