Re‐evaluating the role of the Mahalanobis distance measure

It is shown that the sum of squares of the standardised scores of all non‐zero principal components (PCs) equals the squared Mahalanobis distance. A new distance measure, the reduced Mahalanobis distance, is explored in which the number of PCs retained is less than the full rank model. It is illustrated by both one‐class and two‐class classifiers. Linear discriminant analysis can be employed as a soft model, and principal component analysis using the pooled variance‐covariance matrix is introduced as an intermediate view between conjoint and disjoint models allowing linear discriminant analysis to be used on these reduced rank models. By choosing the most discriminatory PCs, it can be shown that the reduced Mahalanobis distance has superior performance over the full rank model for discriminating via soft models. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  Svante Wold,et al.  Pattern recognition by means of disjoint principal components models , 1976, Pattern Recognit..

[2]  H. Hotelling The Generalization of Student’s Ratio , 1931 .

[3]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[5]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[6]  R. Brereton,et al.  Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure , 2009 .

[7]  Richard G. Brereton,et al.  The F distribution and its relationship to the chi squared and t distributions , 2015 .

[8]  R. Brereton The t‐distribution and its relationship to the normal distribution , 2015 .

[9]  H. Riedwyl,et al.  Standard Distance in Univariate and Multivariate Analysis , 1986 .

[10]  Richard G. Brereton,et al.  Chemometrics for Pattern Recognition , 2009 .

[11]  S. Wold,et al.  Principal component analysis of multivariate images , 1989 .

[12]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[13]  R. Brereton One‐class classifiers , 2011 .

[14]  Jerome H. Friedman,et al.  Classification: Oldtimers and newcomers , 1989 .

[15]  D L Massart,et al.  Use of a microcomputer for the definition of multivariate confidence regions in medical diagnosis based on clinical laboratory profiles. , 1984, Computers and biomedical research, an international journal.

[16]  Saiful Islam,et al.  Mahalanobis Distance , 2009, Encyclopedia of Biometrics.

[17]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[18]  L. Penrose,et al.  THE CORRELATION BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE , 2022 .

[19]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[20]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[21]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[22]  R. Brereton Hotelling's T squared distribution, its relationship to the F distribution and its use in multivariate space , 2016 .

[23]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[24]  R. Brereton,et al.  Determination of cocaine contamination on banknotes using tandem mass spectrometry and pattern recognition , 2006 .

[25]  R. Brereton,et al.  The Mahalanobis distance and its relationship to principal component scores , 2015 .

[26]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[27]  A. Pomerantsev Acceptance areas for multivariate classification derived by projection methods , 2008 .