Copula analysis of mixture models

Contemporary computers collect databases that can be too large for classical methods to handle. The present work takes data whose observations are distribution functions (rather than the single numerical point value of classical data) and presents a computational statistical approach of a new methodology to group the distributions into classes. The clustering method links the searched partition to the decomposition of mixture densities, through the notions of a function of distributions and of multi-dimensional copulas. The new clustering technique is illustrated by ascertaining distinct temperature and humidity regions for a global climate dataset and shows that the results compare favorably with those obtained from the standard EM algorithm method.

[1]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[2]  E. Diday Une représentation visuelle des classes empiétantes: les pyramides , 1986 .

[3]  Three Ways of Implementing the EM Algorithm when Parameters are not Identifiable , 2001 .

[4]  A. Chedin,et al.  The Improved Initialization Inversion Method: A High Resolution Physical Method for Temperature Retrievals from Satellites of the TIROS-N Series. , 1985 .

[5]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[6]  Hans-Hermann Bock,et al.  Clustering and Neural Networks , 1998 .

[7]  Younès Hillali Analyse et modélisation des données probabilistes : capacités et lois multidimensionnelles , 1998 .

[8]  Christian Genest,et al.  Une famille de lois bidimensionnelles insolite , 1994 .

[9]  Anne Schroeder,et al.  Analyse d'un mélange de distributions de probabilité de même type , 1976 .

[10]  A. Scott,et al.  Clustering methods based on likelihood ratio criteria. , 1971 .

[11]  Veronique Achard Trois problemes cles de l'analyse 3d de la structure thermo-dynamique de l'atmosphere par satellite : mesure du contenu en ozone; classification des masses d'air; modelisation hyper rapide du transfert radiatif , 1991 .

[12]  Berthold Schweizer,et al.  Probabilistic Metric Spaces , 2011 .

[13]  D. Clayton A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence , 1978 .

[14]  Xiao-Li Meng,et al.  Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm , 1991 .

[15]  N. Sedransk,et al.  Mixtures of Distributions: A Topological Approach , 1988 .

[16]  E. W. Beth Science and classification , 2004, Synthese.

[17]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[18]  Suzanne Winsberg,et al.  Latent class models for time series analysis , 1999 .

[19]  J. William Ahwood,et al.  CLASSIFICATION , 1931, Foundations of Familiar Language.

[20]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[21]  Prakasa Rao Nonparametric functional estimation , 1983 .

[22]  Mir M. Ali,et al.  A class of bivariate distri-butions including the bivariate logistic , 1978 .

[23]  G. Celeux,et al.  Comparison of the mixture and the classification maximum likelihood in cluster analysis , 1993 .

[24]  A Gordon,et al.  Classification, 2nd Edition , 1999 .

[25]  S. Yakowitz,et al.  On the Identifiability of Finite Mixtures , 1968 .

[26]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[27]  Mathieu Vrac Analyse et modélisation de données probabilistes par décomposition de mélange de copules et application à une base de données climatologiques , 2002 .

[28]  Hans-Hermann Bock,et al.  Advances in data science and classification , 1998 .

[29]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[30]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[31]  C. Genest,et al.  Statistical Inference Procedures for Bivariate Archimedean Copulas , 1993 .

[32]  G. Celeux,et al.  L'algorithme SEM: un algorithme d'apprentissage probabiliste: pour la reconnaissance de mélange de densités , 1986 .

[33]  P. Arabie,et al.  Mapclus: A mathematical programming approach to fitting the adclus model , 1980 .

[34]  Edwin Diday,et al.  The Dynamic Clusters Method in Pattern Recognition , 1974, IFIP Congress.

[35]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[36]  M. J. Frank On the simultaneous associativity ofF(x, y) andx+y−F(x, y) , 1978 .

[37]  P. Deb Finite Mixture Models , 2008 .

[38]  M. J. Frank On the simultaneous associativity ofF(x,y) andx +y -F(x,y) , 1979 .

[39]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[40]  Edwin Diday,et al.  A generalisation of the mixture decomposition problem in the symbolic data analysis framework , 2001 .

[41]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[42]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[43]  Michael J. Symons,et al.  Clustering criteria and multivariate normal mixtures , 1981 .

[44]  Catherine A. Sugar,et al.  Clustering for Sparsely Sampled Functional Data , 2003 .

[45]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[46]  G. Brossier Piecewise hierarchical clustering , 1990 .

[47]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[48]  Bill Ravens,et al.  An Introduction to Copulas , 2000, Technometrics.

[49]  C. Genest,et al.  The Joy of Copulas: Bivariate Distributions with Uniform Marginals , 1986 .

[50]  E. Diday,et al.  Clustering a Global Field of Atmospheric Profiles by Mixture Decomposition of Copulas , 2005 .

[51]  A. Kuk,et al.  MAXIMUM LIKELIHOOD ESTIMATION FOR PROBIT-LINEAR MIXED MODELS WITH CORRELATED RANDOM EFFECTS , 1997 .

[52]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[53]  Mathieu Vrac,et al.  Mixture decomposition of distributions by copulas in the symbolic data analysis framework , 2005, Discret. Appl. Math..

[54]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[55]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[56]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[57]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[58]  R D Bock,et al.  High-dimensional multivariate probit analysis. , 1996, Biometrics.