A bayesian nonparametric mixture model for grouping dependence structures and selecting copula functions

Abstract The demand for advanced dependence modeling arises in a variety of fields, including finance, insurance and health science. When analyzing dependent data, it is important but challenging to properly model the dependence structure in order to carry out valid and efficient inferences. Grouping the data according to the similarity in the dependence structure is necessary, especially for data of a small size. A copula-based model, indexed by copula selection indicators and dependence parameters, is introduced to delineate dependent data and group similar dependence structures. To conduct inference, a Bayesian nonparametric method with the prior distributions specified as a Dirichlet Process is proposed as a mixture of Dirichlet process mixture copula model (M-DPM-CM). Extensive simulation studies have been conducted to evaluate the performance of the proposed procedure, and the results show that the proposed M-DPM-CM can recover the true grouping structure and achieve high accuracy in copula model selection under various finite sample settings. The M-DPM-CM is applied to analyze the Vertebral Column dataset from UCI Machine Learning Repository.

[1]  Adelino R. Ferreira da Silva,et al.  A Dirichlet process mixture model for brain MRI tissue classification , 2007, Medical Image Anal..

[2]  Samuel Kaski,et al.  Bayesian exponential family projections for coupled data sources , 2010, UAI.

[3]  Jean-David Fermanian,et al.  Goodness-of-fit tests for copulas , 2005 .

[4]  H. Labelle,et al.  Analysis of the Sagittal Balance of the Spine and Pelvis Using Shape and Orientation Parameters , 2005, Journal of spinal disorders & techniques.

[5]  Stuart German,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1988 .

[6]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[7]  Oliver Kuss,et al.  Meta‐analysis for diagnostic accuracy studies: a new statistical model using beta‐binomial distributions and bivariate copulas , 2014, Statistics in medicine.

[8]  H. Joe,et al.  The Estimation Method of Inference Functions for Margins for Multivariate Models , 1996 .

[9]  C. Genest,et al.  The Joy of Copulas: Bivariate Distributions with Uniform Marginals , 1986 .

[10]  A. Frigessi,et al.  Pair-copula constructions of multiple dependence , 2009 .

[11]  Volker Roth,et al.  Copula Mixture Model for Dependency-seeking Clustering , 2012, ICML.

[12]  Virginie Rondeau,et al.  A joint frailty-copula model between tumour progression and death for meta-analysis , 2017, Statistical methods in medical research.

[13]  Fernando A. Quintana,et al.  Bayesian Nonparametric Data Analysis , 2015 .

[14]  Guan Yu,et al.  Document clustering via dirichlet process mixture model with feature selection , 2010, KDD.

[15]  Yanqin Fan,et al.  Pseudo‐likelihood ratio tests for semiparametric multivariate copula model selection , 2005 .

[16]  Dimitris Karlis,et al.  Model-based clustering using copulas with applications , 2014, Statistics and Computing.

[17]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Christian Genest,et al.  Copules archimédiennes et families de lois bidimensionnelles dont les marges sont données , 1986 .

[19]  Carla E. Brodley,et al.  Correlation Clustering for Learning Mixtures of Canonical Correlation Models , 2005, SDM.

[20]  Aristidis K. Nikoloulopoulos,et al.  A vine copula mixed effect model for trivariate meta-analysis of diagnostic test accuracy studies accounting for disease prevalence , 2015, Statistical methods in medical research.

[21]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[22]  Vaibhav Rajan,et al.  Gaussian mixture copulas for high-dimensional clustering and dependency-based subtyping , 2019, Bioinform..

[23]  D. B. Dahl Bayesian Inference for Gene Expression and Proteomics: Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[24]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[25]  Bruno Rémillard,et al.  Goodness‐of‐fit Procedures for Copula Models Based on the Probability Integral Transformation , 2006 .

[26]  Markus Junker,et al.  Elliptical copulas: applicability and limitations , 2003 .

[27]  Xiaohong Chen,et al.  Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification , 2006 .

[28]  Samuel Kaski,et al.  Local dependent components , 2007, ICML '07.

[29]  H. Joe Multivariate Models and Multivariate Dependence Concepts , 1997 .

[30]  H. Joe Dependence Modeling with Copulas , 2014 .

[31]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[32]  B. Rémillard,et al.  Goodness-of-fit tests for copulas: A review and a power study , 2006 .

[33]  Marina Vannucci,et al.  Variable selection in clustering via Dirichlet process mixture models , 2006 .

[34]  Vaibhav Rajan,et al.  Dependency Clustering of Mixed Data with Gaussian Mixture Copulas , 2016, IJCAI.

[35]  Michael Giering,et al.  Parametric Characterization of Multimodal Distributions with Non-gaussian Modes , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[36]  Hans Manner,et al.  Estimation and Model Selection of Copulas with an Application to Exchange Rates , 2007 .

[37]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[38]  R. Nelsen An Introduction to Copulas , 1998 .

[39]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[40]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[41]  Jayaran Sethuramant A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[42]  T. Bedford,et al.  Vines: A new graphical model for dependent random variables , 2002 .

[43]  Zoubin Ghahramani,et al.  Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering , 2009 .