Analyzing Multivariate Data

This new edition of the book, originally written by D. J. Carroll and P. E. Green in 1978, covers classical multivariate techniques. It includes new examples, new data, and some new topics (such as binary and multinomial model, structural equation models with latent variables, scaling methods, and cluster analysis) with a fresh writing style, but the approach remains applications oriented. It is designed to appeal to researchers in marketing, psychology, and related Ž elds. It comes with a CD with data, used as examples and in the exercises that appear at the end of the chapters, in various formats, including ASCII, Excel, Minitab, SAS xpt, SAS 7bdat, S–PLUS, SPSS, and Stata. It also contains software for some of the scaling models introduced in Chapter 7. The programs KYST3, SINDSCAL, and MDPREF are accompanied by their own brief documentation Ž les (as well as sample input and output Ž les); these Ž les are executable code and may not work on all systems, however. The authors have also developed student workbooks speciŽ c to particular software packages (e.g., SAS and SPSS) to accompany the textbook. (I did not have the chance to review these workbooks.) As the authors mention, their book assumes familiarity with basic statistics (e.g., regression analysis, and use of matrix algebra), but its level varies from elementary discussions with no mathematical derivations to more advanced treatments that require some knowledge of multivariate statistical methods (e.g., maximum likelihood estimation, likelihood ratio tests, least squares estimation). An appendix with supplementary information on the statistical procedures used in the book would have been helpful to a reader not familiar with the underlying statistical theory. The book is organized in three parts. Part I (Chaps. 1–3) provides an overview of multivariate methods, vectors, matrices, and regression. Part II (Chaps. 4–8) focuses on principal components, factor analysis, multidimensional scaling, and cluster analysis. Part III (Chaps. 9–13) covers canonical correlation, binomial and multinomial models, analysis of variance, and discriminant analysis. In every chapter, the authors discuss the objectives of each method and areas of potential application. The authors seek to build intuition by giving the geometric interpretation of each method. Each chapter contains at least one real world application, with a detailed discussion on most of the issues related to each method, including proper application and interpretation of results and some mathematical derivations. Most of the examples come from marketing, psychology, and related Ž elds. Cross-validation and/or the bootstrap are suggested to validate results for each method; some further information on these and other related methods, such as sensitivity analysis (e.g., Saltelli, Chan, and Scott 2000), would have been nice. Each chapter ends with a summary and few selected readings (I would have liked to see some more recent references), and there is a bibliography at the end of the book. Chapter 1, “Introduction,” starts with a discussion about the nature of multivariate data and continues with a brief description of the methods covered in the book based on the type of data, the type of dependence structure for which it is used, and its purpose. Chapter 2, “Vectors and Matrices,” includes few basic deŽ nitions, such as vector, matrix, two-dimensional Cartesian coordinate system, and Euclidean distance. Its main objective is to enhance understanding of how subsequent multivariate techniques work. It focuses on the geometrical representation of some vector and matrix operations, covering (1) multiplication of a matrix by a scalar as “stretching” or “shrinking” the size of a conŽ guration of points, (2) multiplication of two vectors as a projection of one vector onto the other, (3) multiplication of a matrix by a vector as an orthogonal rotation of a point conŽ guration to a new orientation that preserves its size and shape, (4) singular value decomposition of a matrix as an operation that involves “stretching” and “shrinking” transformation and an orthogonal rotation, and (5) matrix determinant as area, the two-dimensional case. I would have liked to see a more comprehensive treatment of the matrix algebra used in the book, as well as some discussion about graphical techniques (e.g., Krzanowski 2000). Chapter 3, “Regression Analysis,” gives an overview of the regression problem. It describes the process of determining the regression line and the least squares estimator for simple and multivariate linear regression, under the usual assumptions of independent errors with mean 0 and constant variance. It contains subjects related to the regression problem, including variable selection, R2 as a measure of goodness of Ž t, the F test under normal distribution assumption, multicollinearity, heteroscedasticity and autocorrelation of errors, in uential observations, prediction errors, and model validation. Chapter 4, “Principal Component Analysis,” explores principal component analysis as a dimension-reduction technique, as well as a method for identifying patterns of association among variables (principal component loadings). The authors Ž rst discuss the intuition underlying this method and then give the mathematical derivation of principal components (based on the correlation matrix) and principal component loadings. Bartlett’s sphericity test is suggested to determine whether the data are sufŽ ciently highly correlated to reduce dimensionality; the authors comment on the limitations of such testing because of the chi-squared test’s sensitivity to sample size. Questions related to principal component method, such as scaling of data and the number of principal components to use, are also addressed. The following methods to choose principal components are suggested: Scree plot (a graphical approach), Kaiser’s rule (keep the ones with eigenvalues exceeding unity), and Horn’s procedure (keep enough to explain a certain percentage of the variation). In Chapter 5, “Exploratory Factor Analysis,” the conceptual differences between factor analysis and principal component analysis is emphasized. In the development of the chapter, attention is paid to the use of rotation for facilitating the interpretation of a solution. First, a discussion is given using examples with a one and two common factor(s) model, followed by a formal treatment of estimation of the commonalities. Questions including the number of common factors needed, the effect on commonalities of omitting important variables, the orthogonal and oblique rotation of factor solution for easier interpretation, and the nonuniqueness of solutions are addressed. Factor scores—the location of original variables in the reduced factor space—are also presented. A discussion is given on how one can use results for a subsequent analysis. Chapter 6, “ConŽ rmatory Factor Analysis,” explains how one can test a prior notion regarding which variables load on which factors to verify whether it is consistent with the patterns in the data. A solution based on asymptotic maximum likelihood estimates (MLEs) of the parameters of a conŽ rmatory factor analysis model and the goodness of Ž t of different factor models is presented. Details are provided about the derivations of MLEs based on multivariate normal distribution and testing of nested models. Issues related to this technique are explained using examples. Chapter 7, “Multidimensional Scaling,” presents various methods for identifying spatial patterns of similarities for data that capture the proximities between pairs of objects from the same set, as well as from different sets. Each method is illustrated using a different dataset, and detailed discussion of applications is provided. In the case of metric MDs (for data that re ect physical distances), Torgenson’s approach is presented. For nonmetric scaling (in the case of ordinal data), Kruskal’s iterative approach is discussed. In addition, a model by Carroll and Chang for individual differences scaling developed in 1970 is presented. The problem of incomplete ranking of nonmetric data (analysis of preference via an “unfolding model” and MDPREF, a method based on least squares developed by Carroll and Chang in 1968) is also given. Chapter 8, “Cluster Analysis,” focuses on hierarchical [in particular, agglomerative clustering and partitioning methods (e.g., overlapping and fuzzy clusters) are not considered]. Different measures of distance, dissimilarity, and density are presented. The agglomerative (single-linkage) clustering and its properties are discussed. Alternatives (including complete linkage, average linkage, centroid, and Ward’s method) are considered. Next, the K-means clustering procedure is covered, including selection of the initial partition, properties, choice of the number of clusters, and interpretation. The Ž nal section

[1]  Nairanjana Dasgupta,et al.  Analyzing Categorical Data , 2004, Technometrics.

[2]  Alan L. Gross,et al.  Analyzing multivariate data. , 1975 .