Generalized canonical correlation analysis with missing values

Generalized canonical correlation analysis is a versatile technique that allows the joint analysis of several sets of data matrices. The generalized canonical correlation analysis solution can be obtained through an eigenequation and distributional assumptions are not required. When dealing with multiple set data, the situation frequently occurs that some values are missing. In this paper, two new methods for dealing with missing values in generalized canonical correlation analysis are introduced. The first approach, which does not require iterations, is a generalization of the Test Equating method available for principal component analysis. In the second approach, missing values are imputed in such a way that the generalized canonical correlation analysis objective function does not increase in subsequent steps. Convergence is achieved when the value of the objective function remains constant. By means of a simulation study, we assess the performance of the new methods. We compare the results with those of two available methods; the missing-data passive method, introduced in Gifi’s homogeneity analysis framework, and the GENCOM algorithm developed by Green and Carroll. An application using world bank data is used to illustrate the proposed methods.

[1]  J. Berge,et al.  Perceptual Mapping Based on Idiosyncratic Sets of Attributes , 1994 .

[2]  Paul E. Green,et al.  A simple procedure for finding a composite of several multidimensional scaling solutions , 1988 .

[3]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised Edition) , 1999 .

[4]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[5]  Vivian Li,et al.  Socio-economic determinants of HIV/AIDS pandemic and nations efficiencies , 2007, Eur. J. Oper. Res..

[6]  Michel Wedel,et al.  A Comparison of Multidimensional Scaling Methods for Perceptual Mapping , 1999 .

[7]  Jan de Leeuw,et al.  Homogeneity analysis withk sets of variables: An alternating least squares method with optimal scaling features , 1988 .

[8]  Tammo H. A. Bijmolt,et al.  Generalized canonical correlation analysis of matrices with missing rows: a simulation study , 2006, Psychometrika.

[9]  P. Horst Generalized canonical correlations and their applications to experimental data. , 1961, Journal of clinical psychology.

[10]  I. Jolliffe,et al.  Nonlinear Multivariate Analysis , 1992 .

[11]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[12]  Yoshio Takane,et al.  Relationships Between two Methods for Dealing with Missing Data in Principal Component Analysis , 2003 .

[13]  M. Hill,et al.  Nonlinear Multivariate Analysis. , 1990 .

[14]  Paul Horst,et al.  Relations amongm sets of measures , 1961 .

[15]  Ingwer Borg,et al.  Measuring the Similarity of MDS Configurations. , 1985, Multivariate behavioral research.

[16]  Jacques J. F. Commandeur,et al.  Orthogonal Procrustes rotation for matrices with missing values , 1993 .

[17]  Casper J. Albers,et al.  A general approach to handling missing values in Procrustes analysis , 2010, Adv. Data Anal. Classif..

[18]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[19]  J. Kettenring,et al.  Canonical Analysis of Several Sets of Variables , 2022 .

[20]  Brian Everitt,et al.  Homogeneity analysis of incomplete data , 1986 .

[21]  柴山 直 A Linear Composite Method for Test Scores with Missing Values , 1995 .