Vine copula mixture models and clustering for non-Gaussian data

The majority of finite mixture models suffer from not allowing asymmetric tail dependencies within components and not capturing non-elliptical clusters in clustering applications. Since vine copulas are very flexible in capturing these types of dependencies, we propose a novel vine copula mixture model for continuous data. We discuss the model selection and parameter estimation problems and further formulate a new model-based clustering algorithm. The use of vine copulas in clustering allows for a range of shapes and dependency structures for the clusters. Our simulation experiments illustrate a significant gain in clustering accuracy when notably asymmetric tail dependencies or/and non-Gaussian margins within the components exist. The analysis of real data sets accompanies the proposed method. We show that the model-based clustering algorithm with vine copula mixture models outperforms the other model-based clustering techniques, especially for the non-Gaussian multivariate data.

[1]  Ling Hu Dependence patterns across financial markets: a mixed copula approach , 2006 .

[2]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  T. Bedford,et al.  Vines: A new graphical model for dependent random variables , 2002 .

[4]  Mathieu Vrac,et al.  Mixture decomposition of distributions by copulas in the symbolic data analysis framework , 2005, Discret. Appl. Math..

[5]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[6]  Jong-Min Kim,et al.  Mixture of D-vine copulas for modeling dependence , 2013, Comput. Stat. Data Anal..

[7]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[8]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[9]  Ryan P. Browne,et al.  Mixtures of Shifted AsymmetricLaplace Distributions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Roger M. Cooke,et al.  Probability Density Decomposition for Conditionally Dependent Random Variables Modeled by Vines , 2001, Annals of Mathematics and Artificial Intelligence.

[11]  Ryan P. Browne,et al.  A mixture of SDB skew-t factor analyzers , 2013, 1310.6224.

[12]  E. Diday,et al.  Clustering a Global Field of Atmospheric Profiles by Mixture Decomposition of Copulas , 2005 .

[13]  Claudia Czado,et al.  Model selection in sparse high-dimensional vine copula models with an application to portfolio risk , 2019, J. Multivar. Anal..

[14]  Claudia Czado,et al.  Selecting and estimating regular vine copulae and application to financial returns , 2012, Comput. Stat. Data Anal..

[15]  Goran Strbac,et al.  C-Vine Copula Mixture Model for Clustering of Residential Electrical Load Pattern Data , 2017, IEEE Transactions on Power Systems.

[16]  P. McNicholas,et al.  Outlier Detection via Parsimonious Mixtures of Contaminated Gaussian Distributions , 2013 .

[17]  V. H. Lachos,et al.  mixsmsn: Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions , 2013 .

[18]  M. Delignette-Muller,et al.  fitdistrplus: An R Package for Fitting Distributions , 2015 .

[19]  Paul D. McNicholas,et al.  Model-Based Clustering , 2016, Journal of Classification.

[20]  Gregor N. F. Weiß,et al.  Mixture Pair-Copula-Constructions , 2015 .

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  C. Czado,et al.  Truncated regular vines in high dimensions with application to financial data , 2012 .

[23]  Victor H. Lachos,et al.  Multivariate mixture modeling using skew-normal independent distributions , 2012, Comput. Stat. Data Anal..

[24]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[25]  Geoffrey J. McLachlan,et al.  Robust mixture modelling using the t distribution , 2000, Stat. Comput..

[26]  Marc S. Paolella,et al.  Robust normal mixtures for financial portfolio allocation , 2017 .

[27]  Dimitris Karlis,et al.  Model-based clustering using copulas with applications , 2014, Statistics and Computing.

[28]  Claudia Czado,et al.  Pair Copula Constructions for Multivariate Discrete Data , 2012 .

[29]  Ryan P. Browne,et al.  A mixture of generalized hyperbolic distributions , 2013, 1305.1036.

[30]  Grace Y. Yi,et al.  A bayesian nonparametric mixture model for grouping dependence structures and selecting copula functions , 2021 .

[31]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[32]  G. Celeux,et al.  Variable Selection for Clustering with Gaussian Mixture Models , 2009, Biometrics.

[33]  Geoffrey J. McLachlan,et al.  Finite mixtures of multivariate skew t-distributions: some recent and new results , 2014, Stat. Comput..

[34]  Claudia Czado,et al.  Model selection for discrete regular vine copulas , 2017, Comput. Stat. Data Anal..

[35]  Adrian E. Raftery,et al.  Improved initialisation of model-based clustering using Gaussian hierarchical partitions , 2015, Adv. Data Anal. Classif..

[36]  H. Joe Families of $m$-variate distributions with given margins and $m(m-1)/2$ bivariate dependence parameters , 1996 .

[37]  Dimitris Karlis,et al.  Choosing Initial Values for the EM Algorithm for Finite Mixtures , 2003, Comput. Stat. Data Anal..

[38]  A. Frigessi,et al.  Pair-copula constructions of multiple dependence , 2009 .

[39]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[40]  A. Raftery,et al.  Variable Selection for Model-Based Clustering , 2006 .

[41]  Wan-Lun Wang,et al.  Robust model-based clustering via mixtures of skew-t distributions with missing information , 2015, Advances in Data Analysis and Classification.

[42]  Qingyang Zhang,et al.  A mixture copula Bayesian network model for multimodal genomic data , 2017, bioRxiv.

[43]  Charles Bouveyron,et al.  Model-based clustering of high-dimensional data: A review , 2014, Comput. Stat. Data Anal..

[44]  P. McNicholas,et al.  Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant , 2011 .

[45]  Claudia Czado,et al.  Analyzing Dependent Data with Vine Copulas , 2019, Lecture Notes in Statistics.

[46]  Claudia Czado,et al.  Simplified pair copula constructions - Limitations and extensions , 2013, J. Multivar. Anal..

[47]  H. Joe,et al.  The Estimation Method of Inference Functions for Margins for Multivariate Models , 1996 .

[48]  Anandarup Roy,et al.  Pair-copula based mixture models and their application in clustering , 2014, Pattern Recognit..

[49]  P. Embrechts,et al.  Dependence modeling with copulas , 2007 .

[50]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[51]  Ryan P. Browne,et al.  Mixtures of multivariate power exponential distributions , 2015, Biometrics.

[52]  D. Rubin,et al.  The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence , 1994 .

[53]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[54]  BiernackiChristophe,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000 .

[55]  Pavel Krupskii,et al.  Factor copula models for multivariate data , 2013, J. Multivar. Anal..

[56]  Tsung-I Lin,et al.  Finite mixture modelling using the skew normal distribution , 2007 .

[57]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[58]  Christian Hennig,et al.  Methods for merging Gaussian mixture components , 2010, Adv. Data Anal. Classif..

[59]  Jacques Janssen,et al.  Clayton copula and mixture decomposition , 2005 .