Model Based Clustering of Audio Clips Using Gaussian Mixture Models

The task of clustering multi-variate trajectory data of varying length exists in various domains. Model-based methods are capable of handling varying length trajectories without changing the length or structure. Hidden Markov models (HMMs) are widely used for trajectory data modeling. However, HMMs are not suitable for trajectories of long duration. In this paper, we propose a similarity based representation for multi-variate, varying length trajectories of long duration using Gaussian mixture models. Each trajectory is modeled by a Gaussian mixture model (GMM). The log-likelihood of a trajectory for a given GMM model is used as a similarity score. The scores corresponding to all the trajectories in the given data set and all the GMMs are used to form a score matrix that is used in a clustering algorithm. The proposed model based clustering method is applied on the audio clips which are multi-variate trajectories of varying length and long duration. The performance of the proposed method is much better than the method that uses a fixed length representation for an audio clip based on the perceptual features.

[1]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[2]  Padhraic Smyth,et al.  A General Probabilistic Framework for Clustering Individuals , 2000, KDD 2000.

[3]  Tony Jebara,et al.  Spectral Clustering and Embedding with Hidden Markov Models , 2007, ECML.

[4]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[5]  Padhraic Smyth,et al.  A general probabilistic framework for clustering individuals and objects , 2000, KDD '00.

[6]  R. Anitha,et al.  Outerproduct of trajectory matrix for acoustic modeling using support vector machines , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[7]  Chellu Chandra Sekhar,et al.  A density based method for multivariate time series clustering in kernel feature space , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[8]  Gareth J. Janacek,et al.  Clustering time series from ARMA models with clipped data , 2004, KDD.

[9]  Manuele Bicego,et al.  A Hidden Markov Model-Based Approach to Sequential Data Clustering , 2002, SSPR/SPR.

[10]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[11]  Robert P. W. Duin,et al.  Dissimilarity representations allow for building good classifiers , 2002, Pattern Recognit. Lett..

[12]  Qiang Yang,et al.  Integrating hidden Markov models and spectral analysis for sensory time series clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[13]  Dit-Yan Yeung,et al.  Mixtures of ARMA models for model-based time series clustering , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..