An Empirical Comparison of Variable Standardization Methods in Cluster Analysis.

It is common practice in marketing research to standardize the columns (to mean zero and unit standard deviation) of a persons by variables data matrix, prior to clustering the entities corresponding to the rows of that matrix. This practice is often followed even when the columns are all expressed in similar units, such as ratings on a 7-point, equal interval scale. This study examines six different ways of standardizing matrix columns and compares them with the null case of no column standardization. The analysis is replicated for ten large-scale data sets, comprising derived importances of conjoint-based attributes. Our findings indicate that the prevailing column standardization practice may be problematic for some kinds of data that marketing researchers use for segmentation. However, we also find that in the background data profiling step, results are reasonably robust to column standardization method.