Relaxed Exponential Kernels for Unsupervised Learning

Many unsupervised learning algorithms make use of kernels that rely on the Euclidean distance between two samples. However, the Euclidean distance is optimal for Gaussian distributed data. In this paper, we relax the global Gaussian assumption made by the Euclidean distance, and propose a locale Gaussian modelling for the immediate neighbourhood of the samples, resulting in an augmented data space formed by the parameters of the local Gaussians. To this end, we propose a convolution kernel for the augmented data space. The factorisable nature of this kernel allows us to introduce (semi)-metrics for this space, which further derives relaxed versions of known kernels for this space. We present empirical results to validate the utility of the proposed localized approach in the context of spectral clustering. The key result of this paper is that this approach that combines the local Gaussian model with measures that adhere to metric properties, yields much better performance in different spectral clustering tasks.

[1]  K. Jöreskog Simultaneous factor analysis in several populations , 1971 .

[2]  Jan Schepers,et al.  A unifying model involving a categorical and/or dimensional reduction for multimode data , 2007, Comput. Stat. Data Anal..

[3]  HAMISH CUNNINGHAM,et al.  Software architecture for language engineering , 2000 .

[4]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[5]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[6]  Mark de Rooij,et al.  Ideal Point Discriminant Analysis Revisited with a Special Emphasis on Visualization , 2009 .

[7]  K. Arrow Social Choice and Individual Values , 1951 .

[8]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[9]  Hajo Holzmann,et al.  Testing for two states in a hidden Markov model , 2008 .

[10]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[11]  Marc Henry,et al.  Identifying Finite Mixtures in Econometric Models , 2010 .

[12]  A. Tversky Features of Similarity , 1977 .

[13]  Michel Tenenhaus,et al.  PLS path modeling , 2005, Comput. Stat. Data Anal..

[14]  Vithala R. Rao,et al.  Conjoint Measurement- for Quantifying Judgmental Data , 1971 .

[15]  N G Waller,et al.  Computerized adaptive personality assessment: an illustration with the Absorption scale. , 1989, Journal of personality and social psychology.

[16]  Bernd Bischl,et al.  Tuning and evolution of support vector kernels , 2012, Evol. Intell..

[17]  Monique Noirhomme-Fraiture,et al.  Far beyond the classical data models: symbolic data analysis , 2011, Stat. Anal. Data Min..

[18]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[19]  Alfred Ultsch,et al.  Pareto Density Estimation: A Density Estimation for Knowledge Discovery , 2005 .

[20]  Jonathan Templin,et al.  Diagnostic Measurement: Theory, Methods, and Applications , 2010 .

[21]  B. Muthén,et al.  Exploratory Structural Equation Modeling , 2009 .

[22]  Roger Koenker,et al.  Inequality constrained quantile regression , 2005 .

[23]  Thomas Rusch,et al.  IRT models with relaxed assumptions in eRm: A manual-like instruction , 2009 .

[24]  E. Luciano,et al.  Copula methods in finance , 2004 .

[25]  William M. Pottenger,et al.  A Survey of Emerging Trend Detection in Textual Data Mining , 2004 .

[26]  Ehtibar N. Dzhafarov,et al.  Matrices with a given number of violations of Regular Minimality , 2011 .

[27]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[28]  Meinard Müller,et al.  Towards Timbre-Invariant Audio Features for Harmony-Based Music , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Joseph G. Ibrahim,et al.  Bayesian Variable Selection , 2000 .

[30]  Monique Noirhomme-Fraiture,et al.  Symbolic Data Analysis and the SODAS Software , 2008 .

[31]  Nick Chater,et al.  Representational Distortion, Similarity and the Universal Law of Generalization , 1997 .

[32]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[33]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[34]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[35]  Jochen Gönsch,et al.  Dynamic Control Mechanisms for Revenue Management with Flexible Products , 2009, Comput. Oper. Res..

[36]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[37]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[38]  B. P. Bogert,et al.  The quefrency analysis of time series for echoes : cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking , 1963 .

[39]  José G. Dias,et al.  Bootstrap methods for measuring classification uncertainty in latent class analysis , 2006 .

[40]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[41]  Daniel Baier,et al.  Conjointanalyse : Methoden - Anwendungen - Praxisbeispiele , 2009 .

[42]  Robert Phillips,et al.  Revenue Management of Flexible Products , 2004, Manuf. Serv. Oper. Manag..

[43]  I. J. Schoenberg,et al.  Metric spaces and positive definite functions , 1938 .

[44]  Adi Raveh,et al.  Partial order scalogram analysis with base coordinates (POSAC): Its application to crime patterns in all the states in the United States , 1993, Journal of Quantitative Criminology.

[45]  Gerhard H. Fischer,et al.  Linear Logistic Models for Change , 1995 .

[46]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[47]  Kenneth A. Bollen,et al.  Structural Equations with Latent Variables , 1989 .

[48]  Charles Gide,et al.  Cours d'économie politique , 1911 .

[49]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[50]  R. Harshman,et al.  A Model for the Analysis of Asymmetric Data in Marketing Research , 1982 .

[51]  B. Rost Basel Committee On Banking Supervision , 2010 .

[52]  Jean-Loup Guillaume,et al.  Fast unfolding of community hierarchies in large networks , 2008, ArXiv.

[53]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[54]  Jean-Pierre Barthélemy,et al.  Binary clustering , 2008, Discret. Appl. Math..

[55]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[56]  Ehtibar N. Dzhafarov,et al.  Regular Minimality: A Fundamental Law of Discrimination. , 2006 .

[57]  E. N. Adams Consensus Techniques and the Comparison of Taxonomic Trees , 1972 .

[58]  Ingo Schmitt,et al.  QQL: A DB&IR Query Language , 2007, The VLDB Journal.

[59]  Ian Witten,et al.  Data Mining , 2000 .

[60]  Ehtibar N. Dzhafarov,et al.  Dissimilarity cumulation theory and subjective metrics , 2007 .

[61]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[62]  Tony Jebara,et al.  A Kernel Between Sets of Vectors , 2003, ICML.

[63]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[64]  Giampiero M. Gallo,et al.  Volatility Estimation via Hidden Markov Models , 2006 .

[65]  P. Brito,et al.  Modelling interval data with Normal and Skew-Normal distributions , 2012 .

[66]  David J. Hand,et al.  Mixtures of normal distributions , 1981 .

[67]  Eric Maris,et al.  Perceptual analysis of two-way two-mode frequency data: probability matrix decomposition and two alternatives , 1997 .

[68]  Henk A. L. Kiers,et al.  The Harris-Kaiser independent cluster rotation as a method for rotation to simple component weights , 1994 .

[69]  Sean Cleary,et al.  Dividend Smoothing and Debt Ratings , 2006, Journal of Financial and Quantitative Analysis.

[70]  M. Hubert,et al.  High-Breakdown Robust Multivariate Methods , 2008, 0808.0657.

[71]  M. Chavent,et al.  ClustOfVar: An R Package for the Clustering of Variables , 2011, 1112.0295.

[72]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[73]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[74]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[75]  Friedrich Schmid,et al.  Copula-Based Measures of Multivariate Association , 2010 .

[76]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[77]  Jan W. Owsiński On a new naturally indexed quick clustering method with a global objective function , 1990 .

[78]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[79]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[80]  José G. Dias,et al.  Mixture Hidden Markov Models in Finance Research , 2008, GfKl.

[81]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[82]  Kenji Fukumizu,et al.  Semigroup Kernels on Measures , 2005, J. Mach. Learn. Res..

[83]  F.J.R. van de Vijver,et al.  Methods and Data Analysis for Cross-Cultural Research , 1997 .

[84]  Heinz Hollenstein,et al.  Innovation modes in the Swiss service sector: a cluster analysis based on firm-level data , 2003 .

[85]  Dervis Karaboga,et al.  A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm , 2007, J. Glob. Optim..

[86]  Ali Ünlü,et al.  TESTING FOR REGULAR MINIMALITY , 2010 .

[87]  Hiroshi Yadohisa,et al.  Data analysis of asymmetric structures , 2004 .

[88]  J. O’Loughlin,et al.  Peering into the Fog of War: The Geography of the WikiLeaks Afghanistan War Logs, 2004-2009 , 2010 .

[89]  Raja Kali,et al.  Financial Contagion on the International Trade Network , 2005 .

[90]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[91]  FerdowsiZahra Accenture Technology Labs , 2011 .

[92]  Ronald Christensen,et al.  Log-Linear Models and Logistic Regression , 1997 .

[93]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[94]  Konrad Tollmar,et al.  Searching the Web with mobile images for location recognition , 2004, CVPR 2004.

[95]  Patrick R. McMullen,et al.  Optimal product design using a colony of virtual ants , 2007, Eur. J. Oper. Res..

[96]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[97]  Donald Goldfarb,et al.  A numerically stable dual method for solving strictly convex quadratic programs , 1983, Math. Program..

[98]  James D. Hamilton A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle , 1989 .

[99]  Akinori Okada,et al.  One-mode three-way overlapping cluster analysis , 2009, Comput. Stat..

[100]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[101]  P.G.M. Van der Heijden,et al.  Multiple correspondence analysis with missing data , 2003 .

[102]  Adilson Elias Xavier,et al.  Solving the minimum sum-of-squares clustering problem by hyperbolic smoothing and partition into boundary and gravitational regions , 2011, Pattern Recognit..

[103]  Frank Kirchner,et al.  Towards Operator Monitoring via Brain Reading - An EEG-based Approach for Space Applications , 2010 .

[104]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[105]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[106]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[107]  Li Ma,et al.  Scalable Community Discovery of Large Networks , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[108]  Frans Wiering,et al.  Robust Segmentation and Annotation of Folk Song Recordings , 2009, ISMIR.

[109]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[110]  Nuria Oliver,et al.  Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering , 2010, RecSys '10.

[111]  Philippe Jorion Value at risk: the new benchmark for controlling market risk , 1996 .

[112]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[113]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[114]  Pierpaolo D'Urso,et al.  Fuzzy Time Arrays and Dissimilarity Measures For Fuzzy Time Trajectories , 2000 .

[115]  François Bavaud,et al.  On the Schoenberg Transformations in Data Analysis: Theory and Illustrations , 2010, J. Classif..

[116]  Henk A. L. Kiers,et al.  Principal covariates regression: Part I. Theory , 1992 .

[117]  E. Wolff N-dimensional measures of dependence. , 1980 .

[118]  A. Parasuraman,et al.  SERVQUAL: A multiple-item scale for measuring consumer perceptions of service quality. , 1988 .

[119]  W. Wells,et al.  Activities, interests and opinions. , 1971 .

[120]  Kevin M. Quinn,et al.  Bayesian Factor Analysis for Mixed Ordinal and Continuous Responses , 2004, Political Analysis.

[121]  Marie Chavent,et al.  Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis , 2011, Journal of Classification.

[122]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[123]  W. Zucchini,et al.  Hidden Markov Models for Time Series: An Introduction Using R , 2009 .

[124]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[125]  J. Ibrahim Incomplete Data in Generalized Linear Models , 1990 .

[126]  Nikos Mamoulis,et al.  Fast and Exact Warping of Time Series Using Adaptive Segmental Approximations , 2005, Machine Learning.

[127]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[128]  Girish N. Punj,et al.  Cluster Analysis in Marketing Research: Review and Suggestions for Application , 1983 .

[129]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[130]  Rainer Spang,et al.  Microarray Based Diagnosis Profits from Better Documentation of Gene Expression Signatures , 2008, PLoS Comput. Biol..

[131]  W. Förstner,et al.  A Metric for Covariance Matrices , 2003 .

[132]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[133]  Igor Vatolkin,et al.  AMUSE (Advanced MUSic Explorer) - A Multitool Framework for Music Data Analysis , 2010, ISMIR.

[134]  Jan Bulla,et al.  Stylized facts of financial time series and hidden semi-Markov models , 2006, Comput. Stat. Data Anal..

[135]  R. Fildes Journal of the Royal Statistical Society (B): Gary K. Grunwald, Adrian E. Raftery and Peter Guttorp, 1993, “Time series of continuous proportions”, 55, 103–116.☆ , 1993 .

[136]  Juan José Rodríguez Diez,et al.  Interval and dynamic time warping-based decision trees , 2004, SAC '04.

[137]  Pengfei Li,et al.  Testing the Order of a Finite Mixture , 2010 .

[138]  Rahim Alhamzawi,et al.  bayesQR: Bayesian quantile regression , 2011 .

[139]  Dimitris K. Tasoulis,et al.  Enhancing principal direction divisive clustering , 2010, Pattern Recognit..

[140]  Roderick P. McDonald,et al.  DIFFICULTY FACTORS IN BINARY DATA , 1974 .

[141]  G. Molenberghs,et al.  Models for Discrete Longitudinal Data , 2005 .

[142]  G. Tutz,et al.  Sparse Modeling of Categorial Explanatory Variables , 2011, 1101.1421.

[143]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[144]  Anthony Arundel,et al.  How Europe's economies learn: a comparison of work organization and innovation mode for the EU-15 , 2007 .

[145]  Dirk Van den Poel,et al.  Binary quantile regression: a Bayesian approach based on the asymmetric Laplace distribution , 2012 .

[146]  W. Meredith Measurement invariance, factor analysis and factorial invariance , 1993 .

[147]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[148]  A Gordon,et al.  Classification, 2nd Edition , 1999 .

[149]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[150]  Patrice Bertrand Set Systems and Dissimilarities , 2000, Eur. J. Comb..

[151]  Emilia Gómez Gutiérrez,et al.  Tonal description of music audio signals , 2006 .

[152]  Alessio Farcomeni,et al.  Error rates for multivariate outlier detection , 2011, Comput. Stat. Data Anal..

[153]  Thomas Bartz-Beielstein,et al.  Parameter-Tuned Data Mining: A General Framework , 2010 .

[154]  Matthias Hein,et al.  Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.

[155]  P. Dutilleul The mle algorithm for the matrix normal distribution , 1999 .

[156]  Robert T. Clemen,et al.  Copula Models for Aggregating Expert Opinions , 1996, Oper. Res..

[157]  Daniel P. W. Ellis,et al.  Multiple-Instance Learning for Music Information Retrieval , 2008, ISMIR.

[158]  Jim Albert,et al.  Ordinal Data Modeling , 2000 .

[159]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[160]  Sean D. Campbell A review of backtesting and backtesting procedures , 2005 .

[161]  Fred R. McMorris,et al.  Axiomatic Consensus Theory in Group Choice and Biomathematics , 2003 .

[162]  Daniel Baier,et al.  Image Clustering for Marketing Purposes , 2010, GfKl.

[163]  Pascal Vincent,et al.  Manifold Parzen Windows , 2002, NIPS.

[164]  Jean-Paul Fox,et al.  Bayesian Item Response Modeling , 2010 .

[165]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[166]  D. I. Blokhintsev Grundlagen der Quantenmechanik , 1936, Nature.

[167]  Lee Redding Firm Size and Dividend Payouts , 1997 .

[168]  C. Atkinson Rao's distance measure , 1981 .

[169]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[170]  F. Eugene,et al.  Fama, and French. , 1993 .

[171]  Lior Wolf,et al.  Learning over Sets using Kernel Principal Angles , 2003, J. Mach. Learn. Res..

[172]  Jürgen Rost,et al.  Rasch Models in Latent Classes: An Integration of Two Approaches to Item Analysis , 1990 .

[173]  François Bavaud,et al.  Aggregation invariance in general clustering approaches , 2009, Adv. Data Anal. Classif..

[174]  Sunduz Keles,et al.  Sparse Partial Least Squares Classification for High Dimensional Data , 2010, Statistical applications in genetics and molecular biology.

[175]  M. Brusco,et al.  Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures , 2008 .

[176]  M. Wedel,et al.  Market Segmentation: Conceptual and Methodological Foundations , 1997 .

[177]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[178]  Herman Wold,et al.  Soft modelling: The Basic Design and Some Extensions , 1982 .

[179]  Age K. Smilde,et al.  A comparison of various methods for multivariate regression with highly collinear variables , 2007, Stat. Methods Appl..

[180]  Gesellschaft für Klassifikation. Jahrestagung,et al.  Advances in Data Analysis, Proceedings of the 30th Annual Conference of the Gesellschaft für Klassifikation e.V., Freie Universität Berlin, March 8-10, 2006 , 2007, GfKl.

[181]  Marco Dorigo,et al.  Ant colony optimization for continuous domains , 2008, Eur. J. Oper. Res..

[182]  Eric P. Xing,et al.  Nonextensive Information Theoretic Kernels on Measures , 2009, J. Mach. Learn. Res..

[183]  J. Lawless Negative binomial and mixed Poisson regression , 1987 .

[184]  George W. Collins,et al.  Methods and Data Analysis , 2003 .

[185]  José G. Dias,et al.  A bootstrap-based aggregate classifier for model-based clustering , 2008, Comput. Stat..

[186]  B. Brown,et al.  Concepts and Techniques , 1983 .

[187]  Deborah F. Swayne,et al.  Interactive and Dynamic Graphics for Data Analysis - With R and GGobi , 2007, Use R.