Finite mixtures, projection pursuit and tensor rank: a triangulation

Finite mixtures of multivariate distributions play a fundamental role in model-based clustering. However, they pose several problems, especially in the presence of many irrelevant variables. Dimension reduction methods, such as projection pursuit, are commonly used to address these problems. In this paper, we use skewness-maximizing projections to recover the subspace which optimally separates the cluster means. Skewness might then be removed in order to search for other potentially interesting data structures or to perform skewness-sensitive statistical analyses, such as the Hotelling’s $$ T^{2}$$T2 test. Our approach is algebraic in nature and deals with the symmetric tensor rank of the third multivariate cumulant. We also derive closed-form expressions for the symmetric tensor rank of the third cumulants of several multivariate mixture models, including mixtures of skew-normal distributions and mixtures of two symmetric components with proportional covariance matrices. Theoretical results in this paper shed some light on the connection between the estimated number of mixture components and their skewness.

[1]  K. Mardia Measures of multivariate skewness and kurtosis with applications , 1970 .

[2]  M. Genton,et al.  Robust Likelihood Methods Based on the Skew‐t and Related Distributions , 2008 .

[3]  Pierre Comon,et al.  Tensors : A brief introduction , 2014, IEEE Signal Processing Magazine.

[4]  N. Loperfido,et al.  Third cumulant for multivariate aggregate claim models , 2018 .

[5]  T. Tarpey,et al.  Model misspecification , 2008, Statistical modelling.

[6]  Christian Hennig,et al.  A Method for Visual Cluster Validation , 2005, GfKl.

[7]  Christian Hennig,et al.  Asymmetric Linear Dimension Reduction for Classification , 2004 .

[8]  N. Loperfido Singular value decomposition of the third multivariate moment , 2015 .

[9]  J. Friedman Exploratory Projection Pursuit , 1987 .

[10]  J. M. Landsberg,et al.  On the geometry of border rank algorithms for matrix multiplication and other tensors with symmetry , 2016, ArXiv.

[11]  R. Ambagaspitiya,et al.  On the distributions of two classes of correlated aggregate claims , 1999 .

[12]  Bruce G. Lindsay,et al.  Projection pursuit via white noise matrices , 2010 .

[13]  K. Nordhausen,et al.  Fourth Moments and Independent Component Analysis , 2014, 1406.4765.

[14]  Volodymyr Melnykov,et al.  Finite mixture models and model-based clustering , 2010 .

[15]  Dimitris Karlis,et al.  Choosing Initial Values for the EM Algorithm for Finite Mixtures , 2003, Comput. Stat. Data Anal..

[16]  Nicola Loperfido,et al.  Vector-valued skewness for model-based clustering , 2015 .

[17]  Mitsuhiro Miyazaki,et al.  Algebraic and Computational Aspects of Real Tensor Ranks , 2016 .

[18]  R. Serfling Multivariate Symmetry and Asymmetry , 2006 .

[19]  A. Azzalini,et al.  Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t‐distribution , 2003, 0911.2342.

[20]  Nicola Loperfido,et al.  Linear transformations to symmetry , 2014, J. Multivar. Anal..

[21]  D. Dey,et al.  A General Class of Multivariate Skew-Elliptical Distributions , 2001 .

[22]  S. Frühwirth-Schnatter,et al.  Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. , 2010, Biostatistics.

[23]  Wojtek J. Krzanowski,et al.  Projection Pursuit Clustering for Exploratory Data Analysis , 2003 .

[24]  Luca Scrucca,et al.  Dimension reduction for model-based clustering , 2015, Stat. Comput..

[25]  Tamás F. Móri,et al.  On Multivariate Skewness and Kurtosis , 1994 .

[26]  Charles Bouveyron,et al.  Model-based clustering of high-dimensional data: A review , 2014, Comput. Stat. Data Anal..

[27]  H. Geurts,et al.  Departure from normality in multivariate normative comparison: The Cramér alternative for Hotelling's T 2 , 2010, Neuropsychologia.

[28]  M. Rockinger,et al.  Optimal Portfolio Allocation Under Higher Moments , 2004 .

[29]  Martin Eling,et al.  Skewed distributions in finance and actuarial science: a review , 2015 .

[30]  Nicola Loperfido,et al.  Skewness and the linear discriminant function , 2013 .

[31]  N. Loperfido Generalized Skew-Normal Distributions , 2004 .

[32]  Reinaldo Boris Arellano-Valle,et al.  Shape mixtures of multivariate skew-normal distributions , 2009, J. Multivar. Anal..

[33]  B. Lindsay,et al.  Fisher information matrix: A tool for dimension reduction, projection pursuit, independent component analysis, and more , 2012 .

[34]  J. LeBlanc,et al.  Skewness maximization for impulsive sources in blind deconvolution , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[35]  Geoffrey J. McLachlan,et al.  Model-based clustering and classification with non-normal mixture distributions , 2013, Stat. Methods Appl..

[36]  Luke Oeding,et al.  Eigenvectors of tensors and algorithms for Waring decomposition , 2011, J. Symb. Comput..

[37]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[38]  Nicola Loperfido,et al.  Modelling air pollution data by the skew-normal distribution , 2010 .

[39]  L. Imhof Matrix Algebra and Its Applications to Statistics and Econometrics , 1998 .

[40]  Paul D. McNicholas,et al.  Model-Based Clustering , 2016, Journal of Classification.

[41]  David E. Tyler,et al.  Invariant co‐ordinate selection , 2009 .

[42]  Paul D. McNicholas,et al.  Dimension reduction for model-based clustering via mixtures of multivariate $$t$$t-distributions , 2013, Adv. Data Anal. Classif..

[43]  F. Prieto,et al.  Cluster Identification Using Projections , 2001 .

[44]  Luca Scrucca,et al.  Graphical tools for model-based mixture discriminant analysis , 2013, Advances in Data Analysis and Classification.

[45]  Gene H. Golub,et al.  Symmetric Tensors and Symmetric Tensor Rank , 2008, SIAM J. Matrix Anal. Appl..

[46]  B. Mallick,et al.  Moments of random vectors with skew t distribution and their quadratic forms , 2003 .

[47]  Nicola Loperfido,et al.  Skewness-based projection pursuit: A computational approach , 2018, Comput. Stat. Data Anal..