Re‐centered kurtosis as a projection pursuit index for multivariate data analysis

High‐dimensional data, which have become common in analytical chemistry, are often rich in information, but useful information may not be discovered without applying advanced data analysis methods. As a powerful tool for exploratory data analysis, projection pursuit (PP) is less widely used in chemistry compared with other methods such as principal component analysis (PCA), although PP often gives better results than PCA. PP does not have a uniquely defined objective function (projection index), and different statistics have been proposed as projection indices. Kurtosis has been widely employed as a projection index, and minimization of kurtosis is helpful in revealing clusters. However, this method often fails when the clusters in a data set are not balanced (i.e., present in unequal proportions). In this work, a newly defined kurtosis, referred to as “re‐centered kurtosis,” is proposed as a projection index. The theory and the optimization algorithms for the re‐centered kurtosis are developed. The utility of the PP method using the proposed re‐centered kurtosis as a projection index to reveal unbalanced clusters is demonstrated by simulated and real experimental data. Copyright © 2013 John Wiley & Sons, Ltd.

[1]  J. A. López del Val,et al.  Principal Components Analysis , 2018, Applied Univariate, Bivariate, and Multivariate Statistics Using Python.

[2]  Desire L. Massart,et al.  Projection methods in chemistry , 2003 .

[3]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[4]  Olvi L. Mangasarian,et al.  Nuclear feature extraction for breast tumor diagnosis , 1993, Electronic Imaging.

[5]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[6]  Gregory A. Mack,et al.  Chemometrics: A Textbook , 1990 .

[7]  L. G. Blackwood Factor Analysis in Chemistry (2nd Ed.) , 1994 .

[8]  C. Posse An effective two-dimensional projection pursuit algorithm , 1990 .

[9]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[10]  J. Friedman Exploratory Projection Pursuit , 1987 .

[11]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[12]  B. Silverman,et al.  Using Kernel Density Estimates to Investigate Multimodality , 1981 .

[13]  I. Helland ON THE STRUCTURE OF PARTIAL LEAST SQUARES REGRESSION , 1988 .

[14]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[15]  F. Prieto,et al.  Cluster Identification Using Projections , 2001 .

[16]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[17]  P. Hall On Polynomial-Based Projection Indices for Exploratory Projection Pursuit , 1989 .

[18]  Francisco J. Prieto,et al.  Multivariate Outlier Detection and Robust Covariance Matrix Estimation , 2001, Technometrics.

[19]  P. Wentzell,et al.  Fast and simple methods for the optimization of kurtosis used as a projection pursuit index. , 2011, Analytica chimica acta.

[20]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[21]  R. Mutihac,et al.  Mining in chemometrics. , 2008, Analytica chimica acta.

[22]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[23]  A. Höskuldsson PLS regression methods , 1988 .

[24]  Piotr A. Kowalski,et al.  Complete Gradient Clustering Algorithm for Features Analysis of X-Ray Images , 2010 .

[25]  Edmund R. Malinowski,et al.  Factor Analysis in Chemistry , 1980 .

[26]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[27]  AdrianP. Wade,et al.  PARVUS An Extendable package of Programs for Data Exploration, Classification of Correlation , 1989 .

[28]  James V. Stone Independent Component Analysis: A Tutorial Introduction , 2007 .

[29]  N. Campbell,et al.  A multivariate study of variation in two species of rock crab of the genus Leptograpsus , 1974 .

[30]  David Poole,et al.  Linear Algebra: A Modern Introduction , 2002 .