Copula Archetypal Analysis

We present an extension of classical archetypal analysis (AA). It is motivated by the observation that classical AA is not invariant against strictly monotone increasing transformations. Establishing such an invariance is desirable since it makes AA independent of the chosen measure: representing a data set in meters or log(meters) should lead to approximately the same archetypes. The desired invariance is achieved by introducing a semi-parametric Gaussian copula. This ensures the desired invariance and makes AA more robust against outliers and missing values. Furthermore, our framework can deal with mixed discrete/continuous data, which certainly is the most widely encountered type of data in real world applications. Since the proposed extension is presented in form of a preprocessing step, updating existing classical AA models is especially effortless.

[1]  J. Garin,et al.  Yap1 and Skn7 Control Two Specialized Oxidative Stress Response Regulons in Yeast* , 1999, The Journal of Biological Chemistry.

[2]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[3]  Christian Bauckhage,et al.  Hierarchical Convex NMF for Clustering Massive Data , 2010, ACML.

[4]  O Shoval,et al.  Evolutionary Trade-Offs, Pareto Optimality, and the Geometry of Phenotype Space , 2012, Science.

[5]  Christian Bauckhage,et al.  Archetypical motion: Supervised game behavior learning with Archetypal Analysis , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[6]  R. Nelsen An Introduction to Copulas , 1998 .

[7]  Igor Kononenko,et al.  Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization , 2014, Expert Syst. Appl..

[8]  Christian Bauckhage,et al.  Making Archetypal Analysis Practical , 2009, DAGM-Symposium.

[9]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[10]  H. Joe Multivariate Models and Multivariate Dependence Concepts , 1997 .

[11]  Christian Bauckhage,et al.  Kernel Archetypal Analysis for Clustering Web Search Frequency Time Series , 2014, 2014 22nd International Conference on Pattern Recognition.

[12]  J. Rayner,et al.  Ecological Morphology and Flight in Bats (Mammalia; Chiroptera): Wing Adaptations, Flight Performance, Foraging Strategy and Echolocation , 1987 .

[13]  Volker Roth,et al.  Automatic Model Selection in Archetype Analysis , 2012, DAGM/OAGM Symposium.

[14]  Christian Bauckhage,et al.  Convex Non-negative Matrix Factorization in the Wild , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[15]  Bernt Schiele,et al.  Where Next in Object Recognition and how much Supervision Do We Need? , 2013, Advanced Topics in Computer Vision.

[16]  Christian Bauckhage,et al.  Archetypal Images in Large Photo Collections , 2009, 2009 IEEE International Conference on Semantic Computing.

[17]  Richard T. Carson,et al.  Archetypal analysis: a new way to segment markets based on extreme individuals , 2003 .

[18]  Peter D. Hoff Extending the rank likelihood for semiparametric copula estimation , 2006, math/0610413.