Archetypal analysis for ordinal data

Abstract Archetypoid analysis (ADA) is an exploratory approach that explains a set of continuous observations as mixtures of pure (extreme) patterns. Those patterns (archetypoids) are actual observations of the sample which makes the results of this technique easily interpretable, even for non-experts. Note that the observations are approximated as a convex combination of the archetypoids. Archetypoid analysis, in its current form, cannot be applied directly to ordinal data. We propose and describe a two-step method for applying ADA to ordinal responses based on the ordered stereotype model. One of the main advantages of this model is that it allows us to convert the ordinal data to numerical values, using a new data-driven spacing that better reflects the ordinal patterns of the data, and this numerical conversion then enables us to apply ADA straightforwardly. The results of the novel method are presented for two behavioural science applications. Finally, the proposed method is also compared with other unsupervised statistical learning methods.

[1]  Daniel Fernández,et al.  A goodness-of-fit test for the ordered stereotype model. , 2016, Statistics in medicine.

[2]  Amelia Simó,et al.  Archetypal Analysis With Missing Data: See All Samples by Looking at a Few Based on Extreme Profiles , 2020, The American Statistician.

[3]  Josep Domingo-Ferrer,et al.  Regression for ordinal variables without underlying continuous variables , 2006, Inf. Sci..

[4]  Alfredo Ballester,et al.  Archetype analysis: A new subspace outlier detection approach , 2021, Knowl. Based Syst..

[5]  Amelia Simó,et al.  Archetypal shapes based on landmarks and extension to handle missing data , 2018, Adv. Data Anal. Classif..

[6]  C. Ji An Archetypal Analysis on , 2005 .

[7]  Irene Epifanio,et al.  ARCHETYPAL ANALYSIS: AN ALTERNATIVE TO CLUSTERING FOR UNSUPERVISED TEXTURE SEGMENTATION , 2019, Image Analysis & Stereology.

[8]  Sandra Alemany,et al.  Archetypal analysis: Contributions for estimating boundary cases in multivariate accommodation problem , 2013, Comput. Ind. Eng..

[9]  Pier Alda Ferrari,et al.  Handling Missing Data in Presence of Categorical Variables: a New Imputation Procedure , 2011 .

[10]  Sandra Alemany,et al.  Archetypoids: A new approach to define representative archetypal data , 2015, Comput. Stat. Data Anal..

[11]  Tyler Davis,et al.  Memory for Category Information Is Idealized Through Contrast With Competing Options , 2010, Psychological science.

[12]  Ivy Liu,et al.  Assigning scores for ordered categorical responses , 2019, Journal of applied statistics.

[13]  N. Cliff Answering Ordinal Questions with Ordinal Data Using Ordinal Statistics. , 1996, Multivariate behavioral research.

[14]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[15]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[16]  Irene Epifanio,et al.  Archetypoid analysis for sports analytics , 2017, Data Mining and Knowledge Discovery.

[17]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[18]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[19]  Julien Jacques,et al.  Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm , 2015, Statistics and Computing.

[20]  Christian Bauckhage,et al.  Descriptive matrix factorization for sustainability Adopting the principle of opposites , 2011, Data Mining and Knowledge Discovery.

[21]  Lars Kai Hansen,et al.  Archetypal analysis for machine learning , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[22]  Michael Fernandez,et al.  Identification of Nanoparticle Prototypes and Archetypes. , 2015, ACS nano.

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  D. Fernández,et al.  Mixture-based clustering for the ordered stereotype model , 2016, Comput. Stat. Data Anal..

[25]  Irene Epifanio,et al.  Finding archetypal patterns for binary questionnaires , 2020 .

[26]  Sohan Seth,et al.  Probabilistic archetypal analysis , 2013, Machine Learning.

[27]  Irene Epifanio,et al.  Robust multivariate and functional archetypal analysis with application to financial time series analysis , 2018, Physica A: Statistical Mechanics and its Applications.

[28]  Irene Epifanio,et al.  Robust archetypoids for anomaly detection in big functional data , 2020, Adv. Data Anal. Classif..

[29]  Manuel J. A. Eugster,et al.  Archetypal Analysis for Nominal Observations , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Lefteris Angelis,et al.  A novel single-trial methodology for studying brain response variability based on archetypal analysis , 2015, Expert Syst. Appl..

[31]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[32]  A. Agresti Analysis of Ordinal Categorical Data: Agresti/Analysis , 2010 .

[33]  P. McCullagh Analysis of Ordinal Categorical Data , 1985 .

[34]  Manuel J. A. Eugster,et al.  From Spider-man to Hero - archetypal analysis in R , 2009 .

[35]  Irene Epifanio,et al.  Functional archetype and archetypoid analysis , 2016, Comput. Stat. Data Anal..

[36]  S Greenland,et al.  Alternative models for ordinal logistic regression. , 1994, Statistics in medicine.

[37]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[38]  Paola Annoni,et al.  An imputation method for categorical variables with application to nonlinear principal component analysis , 2011, Comput. Stat. Data Anal..

[39]  Amelia Simó,et al.  A data-driven classification of 3D foot types by archetypal shapes based on landmarks , 2020, PloS one.

[40]  Irene Epifanio,et al.  Detection of Anomalies in Water Networks by Functional Data Analysis , 2018, Mathematical Problems in Engineering.

[41]  J. Anderson Regression and Ordered Categorical Variables , 1984 .

[42]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[43]  Martijn Schouteden,et al.  The Mixed Effects Trend Vector Model , 2012, Multivariate behavioral research.

[44]  I. Epifanio,et al.  Forecasting basketball players' performance using sparse functional data , 2019, Stat. Anal. Data Min..

[45]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[46]  Gains through selection for grain yield in a winter wheat breeding program , 2020, PloS one.

[47]  Alfredo Ballester,et al.  Combining Classification and User-Based Collaborative Filtering for Matching Footwear Size , 2021, Mathematics.

[48]  Morten Mørup,et al.  Archetypal analysis of diverse Pseudomonas aeruginosa transcriptomes reveals adaptation in cystic fibrosis airways , 2013, BMC Bioinformatics.

[49]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.