Dimensionality reduction for data of unknown cluster structure

Dimensionality reduction that preserves certain characteristics of data is needed for numerous reasons. In this work we focus on data coming from a mixture of Gaussian distributions and we propose a method that preserves the distinctness of the clustering structure, although this structure is assumed to be yet unknown. The rationale behind the method is the following: (i) had one known the clusters (classes) within the data, one could facilitate further analysis and reduce space dimensionality by projecting the data to the Fisher's linear subspace, which - by definition - best preserves the structure of the given classes; (ii) under some reasonable assumptions, this can be done, albeit approximately, without prior knowledge of the clusters (classes). In this paper, we show how this approach works. We present a method of preliminary data transformation that brings the directions of largest overall variability close to the directions of the best between-class separation. Hence, for the transformed data, simple PCA provides an approximation to the Fisher's subspace. We show that the transformation preserves the distinctness of the unknown structure in the data to a great extent.

[1]  L. F. Guseman,et al.  Distance preserving linear feature selection , 1979, Pattern Recognit..

[2]  You-yen. Yang Classification into two multivariate normal distributions with different covariance matrices , 1965 .

[3]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[4]  Stan Lipovetsky,et al.  Total Odds and Other Objectives for Clustering via Multinomial-Logit Model , 2012, Adv. Data Sci. Adapt. Anal..

[5]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[6]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixtures of distributions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[7]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[8]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[9]  Dean M. Young,et al.  Optimal linear feature selection for a general class of statistical pattern recognition models , 1985, Pattern Recognit. Lett..

[10]  Patrick L. Odell A model for dimension reduction in pattern recognition using continuous data , 1979, Pattern Recognit..

[11]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[12]  Harry Joe,et al.  Generation of Random Clusters with Specified Degree of Separation , 2006, J. Classif..

[13]  Stan Lipovetsky,et al.  Additive and multiplicative mixed normal distributions and finding cluster centers , 2013, Int. J. Mach. Learn. Cybern..

[14]  Belén Melián-Batista,et al.  High-dimensional feature selection via feature grouping: A Variable Neighborhood Search approach , 2016, Inf. Sci..

[15]  H. P. Decell,et al.  Linear dimension reduction and Bayes classification , 1981, Pattern Recognit..

[16]  Kuldip K. Paliwal,et al.  Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition , 2003, Pattern Recognit..

[17]  Shane G. Henderson,et al.  Behavior of the NORTA method for correlated random vector generation as the dimension increases , 2003, TOMC.

[18]  Adam Tauman Kalai,et al.  Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[19]  W. A. Coberly,et al.  Linear dimension reduction and Bayes classification with unknown population parameters , 1982, Pattern Recognit..

[20]  Stan Lipovetsky,et al.  Clusterability assessment for Gaussian mixture models , 2015, Appl. Math. Comput..

[21]  Sheng-Rui Wang,et al.  A Measurement of Overlap Rate Between Gaussiancomponents , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[22]  Kun Huang,et al.  A unifying theorem for spectral embedding and clustering , 2003, AISTATS.

[23]  Shane G. Henderson,et al.  Corrigendum: Behavior of the NORTA method for correlated random vector generation as the dimension increases , 2006 .

[24]  Lior Wolf,et al.  Kernel principal angles for classification machines with applications to image sequence interpretation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[25]  Sanjeev Arora,et al.  LEARNING MIXTURES OF SEPARATED NONSPHERICAL GAUSSIANS , 2005, math/0503457.

[26]  Ken-ichi Maeda,et al.  Face recognition using temporal image sequence , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[27]  Jacob Goldberger,et al.  An unsupervised data projection that preserves the cluster structure , 2012, Pattern Recognit. Lett..

[28]  Surajit Ray,et al.  The topography of multivariate normal mixtures , 2005 .

[29]  Roger M. Cooke,et al.  Uncertainty Analysis with High Dimensional Dependence Modelling , 2006 .

[30]  Verónica Bolón-Canedo,et al.  Recent advances and emerging challenges of feature selection in the context of big data , 2015, Knowl. Based Syst..

[31]  Stan Lipovetsky,et al.  PCA and SVD with nonnegative loadings , 2009, Pattern Recognit..

[32]  H. Joe Generating random correlation matrices based on partial correlations , 2006 .

[33]  Stan Lipovetsky,et al.  Tractable Measure of Component Overlap for Gaussian Mixture Models , 2014, 1407.7172.

[34]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[35]  Lior Wolf,et al.  Learning over Sets using Kernel Principal Angles , 2003, J. Mach. Learn. Res..

[36]  Joshua Zhexue Huang,et al.  Stratified feature sampling method for ensemble clustering of high dimensional data , 2015, Pattern Recognit..

[37]  Santosh S. Vempala,et al.  Isotropic PCA and Affine-Invariant Clustering , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[38]  Dorra Sellami Masmoudi,et al.  Feature selection in possibilistic modeling , 2015, Pattern Recognit..

[39]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[40]  Roger M. Cooke,et al.  Uncertainty Analysis with High Dimensional Dependence Modelling: Kurowicka/Uncertainty Analysis with High Dimensional Dependence Modelling , 2006 .

[41]  Santosh S. Vempala,et al.  The Spectral Method for General Mixture Models , 2005, COLT.