A Unifying Tool for Linear Multivariate Statistical Methods: The RV‐Coefficient

Consider two data matrices on the same sample of n individuals, X(p x n), Y(q x n). From these matrices, geometrical representations of the sample are obtained as two configurations of n points, in Rp and Rq It is shown that the RV‐coefficient (Escoufier, 1970, 1973) can be used as a measure of similarity of the two configurations, taking into account the possibly distinct metrics to be used on them to measure the distances between points. The purpose of this paper is to show that most classical methods of linear multivariate statistical analysis can be interpreted as the search for optimal linear transformations or, equivalently, the search for optimal metrics to apply on two data matrices on the same sample; the optimality is defined in terms of the similarity of the corresponding configurations of points, which, in turn, calls for the maximization of the associated RV‐coefficient. The methods studied are principal components, principal components of instrumental variables, multivariate regression, canonical variables, discriminant analysis; they are differentiated by the possible relationships existing between the two data matrices involved and by additional constraints under which the maximum of RV is to be obtained. It is also shown that the RV‐coefficient can be used as a measure of goodness of a solution to the problem of discarding variables.