A comparison of two dissimilarity functions for mixed-type predictor variables in the δ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \

The $$\delta $$ δ -machine is a statistical learning tool for classification based on dissimilarities or distances between profiles of the observations to profiles of a representation set, which was proposed by Yuan et al. (J Claasif 36(3): 442–470, 2019). So far, the $$\delta $$ δ -machine was restricted to continuous predictor variables only. In this article, we extend the $$\delta $$ δ -machine to handle continuous, ordinal, nominal, and binary predictor variables. We utilized a tailored dissimilarity function for mixed type variables which was defined by Gower. This measure has properties of a Manhattan distance. We develop, in a similar vein, a Euclidean dissimilarity function for mixed type variables. In simulation studies we compare the performance of the two dissimilarity functions and we compare the predictive performance of the $$\delta $$ δ -machine to logistic regression models. We generated data according to two population distributions where the type of predictor variables, the distribution of categorical variables, and the number of predictor variables was varied. The performance of the $$\delta $$ δ -machine using the two dissimilarity functions and different types of representation set was investigated. The simulation studies showed that the adjusted Euclidean dissimilarity function performed better than the adjusted Gower dissimilarity function; that the $$\delta $$ δ -machine outperformed logistic regression; and that for constructing the representation set, K-medoids clustering achieved fewer active exemplars than the one using K-means clustering while maintaining the accuracy. We also applied the $$\delta $$ δ -machine to an empirical example, discussed its interpretation in detail, and compared the classification performance with five other classification methods. The results showed that the $$\delta $$ δ -machine has a good balance between accuracy and interpretability.

[1]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[2]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[3]  R. Tibshirani,et al.  Lasso and Elastic-Net Regularized Generalized Linear Models [R package glmnet version 4.0-2] , 2020 .

[4]  R. Detrano,et al.  International application of a new probability algorithm for the diagnosis of coronary artery disease. , 1989, The American journal of cardiology.

[5]  J. Meulman,et al.  ROS Regression: Integrating Regularization with Optimal Scaling Regression , 2016, Statistical Science.

[6]  Kelly Trezise,et al.  Informative tools for characterizing individual differences in learning: Latent class, latent profile, and latent transition analysis , 2017, Learning and Individual Differences.

[7]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[8]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[9]  L R Bergman,et al.  A person-oriented approach in research on developmental psychopathology , 1997, Development and Psychopathology.

[10]  Gavin Brown,et al.  Diversity in neural network ensembles , 2004 .

[11]  Panayiotis E. Pintelas,et al.  Combining Bagging and Boosting , 2007 .

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[14]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Boris G. Mirkin,et al.  Concept Learning and Feature Selection Based on Square-Error Clustering , 1999, Machine Learning.

[17]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[18]  R. Detrano,et al.  Bayesian probability analysis: a prospective demonstration of its clinical utility in diagnosing coronary disease. , 1984, Circulation.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Jacob Cohen Measurement Educational and Psychological Educational and Psychological Measurement Eta-squared and Partial Eta-squared in Fixed Factor Anova Designs Educational and Psychological Measurement Additional Services and Information For , 2022 .

[21]  Trevor F. Cox,et al.  Multidimensional Scaling, Second Edition , 2000 .

[22]  Kemal Polat,et al.  The Medical Applications of Attribute Weighted Artificial Immune System (AWAIS): Diagnosis of Heart and Diabetes Diseases , 2005, ICARIS.

[23]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[24]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[25]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[26]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[27]  Willem J. Heiser,et al.  The δ-Machine: Classification Based on Distances Towards Prototypes , 2019, Journal of Classification.

[28]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[29]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[30]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[31]  K. Shigemasu,et al.  Bayesian multidimensional scaling for the estimation of a Minkowski exponent , 2010, Behavior research methods.

[32]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[33]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[34]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[35]  M. Brusco,et al.  Choosing the number of clusters in Κ-means clustering. , 2011, Psychological methods.

[36]  Douglas L. Medin,et al.  Context theory of classification learning. , 1978 .

[37]  J. Maindonald Statistical Learning from a Regression Perspective , 2008 .