The δ-Machine: Classification Based on Distances Towards Prototypes

We introduce the δ-machine, a statistical learning tool for classification based on (dis)similarities between profiles of the observations to profiles of a representation set consisting of prototypes. In this article, we discuss the properties of the δ-machine, propose an automatic decision rule for deciding on the number of clusters for the K-means method on the predictive perspective, and derive variable importance measures and partial dependence plots for the machine. We performed five simulation studies to investigate the properties of the δ-machine. The first three simulation studies were conducted to investigate selection of prototypes, different (dis)similarity functions, and the definition of representation set. Results indicate that we best use the Lasso to select prototypes, that the Euclidean distance is a good dissimilarity function, and that finding a small representation set of prototypes gives sparse but competitive results. The remaining two simulation studies investigated the performance of the δ-machine with imbalanced classes and with unequal covariance matrices for the two classes. The results obtained show that the δ-machine is robust to class imbalances, and that the four (dis)similarity functions had the same performance regardless of the covariance matrices. We also showed the classification performance of the δ-machine compared with three other classification methods on ten real datasets from UCI database, and discuss two empirical examples in detail.

[1]  F. Gregory Ashby,et al.  Multidimensional Models of Perception and Cognition , 2014 .

[2]  Mirkin Boris,et al.  Clustering: A Data Recovery Approach , 2012 .

[3]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[4]  John T. E. Richardson,et al.  Eta Squared and Partial Eta Squared as Measures of Effect Size in Educational Research. , 2011 .

[5]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[6]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[7]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[8]  I-Cheng Yeh,et al.  Knowledge discovery on RFM model using Bernoulli sequence , 2009, Expert Syst. Appl..

[9]  Wlodzislaw Duch,et al.  Make it cheap: Learning with O(nd) complexity , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[12]  P. Green,et al.  An Empirical Comparison of Variable Standardization Methods in Cluster Analysis. , 1996, Multivariate behavioral research.

[13]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[14]  Boris G. Mirkin,et al.  Concept Learning and Feature Selection Based on Square-Error Clustering , 1999, Machine Learning.

[15]  J. Maindonald Statistical Learning from a Regression Perspective , 2008 .

[16]  J. Friedman,et al.  Clustering objects on subsets of attributes (with discussion) , 2004 .

[17]  M. C. Cooper,et al.  The effect of measurement error on determining the number of clusters in clusteranalysis , 1988 .

[18]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[19]  James McDermott,et al.  Diagnosing a disorder in a classification benchmark , 2016, Pattern Recognit. Lett..

[20]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[21]  Azuraliza Abu Bakar,et al.  Comparative Analysis of Algorithms in Supervised Classification: A Case study of Bank Notes Dataset , 2014 .

[22]  Cem Iyigun,et al.  Probabilistic D-Clustering , 2008, J. Classif..

[23]  Patrick J. F. Groenen,et al.  A distance-based variety of nonlinear multivariate data analysis, including weights for objects and variables , 1999 .

[24]  Robert P. W. Duin,et al.  The dissimilarity space: Bridging structural and statistical pattern recognition , 2012, Pattern Recognit. Lett..

[25]  J. Meulman The integration of multidimensional scaling and multivariate analysis with optimal transformations , 1992 .

[26]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[27]  Douglas Steinley,et al.  Standardizing Variables in K -means Clustering , 2004 .

[28]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[31]  Wathiq Laftah Al-Yaseen,et al.  Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for intrusion detection system , 2017, Expert Syst. Appl..

[32]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[33]  Pedro Delicado,et al.  Global and local distance-based generalized linear models , 2015, TEST.

[34]  Juha Vesanto,et al.  Importance of Individual Variables in the k -Means Algorithm , 2001, PAKDD.

[35]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[36]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[37]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[38]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[39]  N. V. Vinodchandran,et al.  SVM-based generalized multiple-instance learning via approximate box counting , 2004, ICML.

[40]  Robert P. W. Duin,et al.  Feature-Based Dissimilarity Space Classification , 2010, ICPR Contests.

[41]  J Zubin,et al.  ON THE METHODS AND THEORY OF CLUSTERING. , 1969, Multivariate behavioral research.

[42]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[43]  L R Bergman,et al.  A person-oriented approach in research on developmental psychopathology , 1997, Development and Psychopathology.

[44]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[45]  Jacob Cohen Measurement Educational and Psychological Educational and Psychological Measurement Eta-squared and Partial Eta-squared in Fixed Factor Anova Designs Educational and Psychological Measurement Additional Services and Information For , 2022 .

[46]  Sanford Weisberg,et al.  An R Companion to Applied Regression , 2010 .

[47]  M. Brusco,et al.  Choosing the number of clusters in Κ-means clustering. , 2011, Psychological methods.

[48]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[49]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[50]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[51]  R. M. Cormack,et al.  A Review of Classification , 1971 .