Clustering a Global Field of Atmospheric Profiles by Mixture Decomposition of Copulas

Abstract This work focuses on the clustering of a large dataset of atmospheric vertical profiles of temperature and humidity in order to model a priori information for the problem of retrieving atmospheric variables from satellite observations. Here, each profile is described by cumulative distribution functions (cdfs) of temperature and specific humidity. The method presented here is based on an extension of the mixture density problem to this kind of data. This method allows dependencies between and among temperature and moisture to be taken into account, through copula functions, which are particular distribution functions, linking a (joint) multivariate distribution with its (marginal) univariate distributions. After a presentation of vertical profiles of temperature and humidity and the method used to transform them into cdfs, the clustering method is detailed and then applied to provide a partition into seven clusters based, first, on the temperature profiles only; second, on the humidity profiles o...

[1]  Mathieu Vrac,et al.  Mixture decomposition of distributions by copulas in the symbolic data analysis framework , 2005, Discret. Appl. Math..

[2]  Henry W. Altland,et al.  Applied Functional Data Analysis , 2003, Technometrics.

[3]  Mathieu Vrac Analyse et modélisation de données probabilistes par décomposition de mélange de copules et application à une base de données climatologiques , 2002 .

[4]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[5]  Claudia J. Stubenrauch,et al.  Characteristics of the TOVS Pathfinder Path-B Dataset , 1999 .

[6]  R. Nelsen An Introduction to Copulas , 1998 .

[7]  Younès Hillali Analyse et modélisation des données probabilistes : capacités et lois multidimensionnelles , 1998 .

[8]  James O. Ramsay,et al.  Functional Data Analysis , 2005 .

[9]  G. Celeux,et al.  Comparison of the mixture and the classification maximum likelihood in cluster analysis , 1993 .

[10]  C. Genest,et al.  Statistical Inference Procedures for Bivariate Archimedean Copulas , 1993 .

[11]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[12]  D. Walker,et al.  An upper-air synoptic climatology of the western United States , 1992 .

[13]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[14]  C. Genest Frank's family of bivariate distributions , 1987 .

[15]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[16]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[17]  A. Chedin,et al.  The Improved Initialization Inversion Method: A High Resolution Physical Method for Temperature Retrievals from Satellites of the TIROS-N Series. , 1985 .

[18]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[19]  B. Everitt,et al.  Finite Mixture Distributions , 1981 .

[20]  Michael J. Symons,et al.  Clustering criteria and multivariate normal mixtures , 1981 .

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  Edwin Diday,et al.  The Dynamic Clusters Method in Pattern Recognition , 1974, IFIP Congress.

[23]  A. Scott,et al.  Clustering methods based on likelihood ratio criteria. , 1971 .