A hierarchical modeling approach for clustering probability density functions

The problem of clustering probability density functions is emerging in different scientific domains. The methods proposed for clustering probability density functions are mainly focused on univariate settings and are based on heuristic clustering solutions. New aspects of the problem associated with the multivariate setting and a model-based perspective are investigated. The novel approach relies on a hierarchical mixture modeling of the data. The method is introduced in the univariate context and then extended to multivariate densities by means of a factorial model performing dimension reduction. Model fitting is carried out using an EM-algorithm. The proposed method is illustrated through simulated experiments and applied to two real data sets in order to compare its performance with alternative clustering strategies.

[1]  Jeroen K. Vermunt,et al.  Multilevel Mixture Factor Models , 2012, Multivariate behavioral research.

[2]  Baba C. Vemuri,et al.  Using the KL-center for efficient and accurate retrieval of distributions arising from texture images , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Pedro Delicado,et al.  Dimensionality reduction when data are density functions , 2011, Comput. Stat. Data Anal..

[4]  Antonio Irpino,et al.  Comparing Histogram Data Using a Mahalanobis–Wasserstein Distance , 2008 .

[5]  Monique Noirhomme-Fraiture,et al.  Far beyond the classical data models: symbolic data analysis , 2011, Stat. Anal. Data Min..

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  Geoffrey J. McLachlan,et al.  Mixtures of factor analyzers for the analysis of high-dimensional data , 2011 .

[8]  Sophia Rabe-Hesketh,et al.  Generalized latent variable models: multilevel, longitudinal, and structural equation models , 2004 .

[9]  Jay Magidson,et al.  LG-Syntax user's guide: Manual for Latent GOLD 4.5 Syntax module , 2008 .

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Christos Faloutsos,et al.  Efficient Distribution Mining and Classification , 2008, SDM.

[12]  Jeroen K. Vermunt,et al.  Determining the Number of Components in Mixture Models for Hierarchical Data , 2008, GfKl.

[13]  Inna Chervoneva,et al.  Two-stage hierarchical modeling for analysis of subpopulations in conditional distributions , 2012, Journal of applied statistics.

[14]  Fritz Drasgow,et al.  Multilevel Mixed-Measurement IRT Analysis: An Explication and Application to Self-Reported Emotions Across the World , 2011 .

[15]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[16]  Ansgar Steland,et al.  New approaches to nonparametric density estimation and selection of smoothing parameters , 2012, Comput. Stat. Data Anal..

[17]  Daniela G. Calò,et al.  A dimensionally reduced finite mixture model for multilevel data , 2010, J. Multivar. Anal..

[18]  A. Montanari,et al.  Heteroscedastic factor mixture analysis , 2010 .

[19]  Jeroen K. Vermunt A hierarchical mixture model for clustering three-way data sets , 2007, Comput. Stat. Data Anal..

[20]  Mathieu Vrac,et al.  Copula analysis of mixture models , 2012, Comput. Stat..

[21]  Cordelia Schmid,et al.  High-dimensional data clustering , 2006, Comput. Stat. Data Anal..

[22]  Jeroen K. Vermunt,et al.  Multilevel latent variable modeling : An application in educational testing , 2008 .

[23]  Antonio Irpino,et al.  Dynamic clustering of interval data using a Wasserstein-based distance , 2008, Pattern Recognit. Lett..

[24]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[25]  Charles Bouveyron,et al.  Model-based clustering of high-dimensional data: A review , 2014, Comput. Stat. Data Anal..

[26]  J. Vermunt,et al.  Latent class and finite mixture models for multilevel data sets , 2008, Statistical methods in medical research.

[27]  Jay Magidson,et al.  Hierarchical Mixture Models for Nested Data Structures , 2004, GfKl.

[28]  Jeroen K. Vermunt,et al.  Multilevel Growth Mixture Models for Classifying Groups , 2010 .