Choosing the Metric in High-Dimensional Spaces Based on Hub Analysis

To avoid the undesired effects of distance concentration in high-dimensional spaces, previous work has already advocated the use of fractional p norms instead of the ubiquitous Euclidean norm. Closely re- lated to concentration is the emergence of hub and anti-hub objects. Hub objects have a small distance to an exceptionally large number of data points while anti-hubs lie far from all other data points. The contribution of this work is an empirical examination of concentration and hubness, re- sulting in an unsupervised approach for choosing an p norm by minimizing hubs while simultaneously maximizing nearest neighbor classification.

[1]  Katja Markert,et al.  Learning Models for Object Recognition from Natural Language Descriptions , 2009, BMVC.

[2]  Yuji Matsumoto,et al.  Investigating the Effectiveness of Laplacian-Based Kernels in Hub Reduction , 2012, AAAI.

[3]  Markus Schedl,et al.  Using Mutual Proximity to Improve Content-Based Audio Similarity , 2011, ISMIR.

[4]  Arthur Flexer,et al.  Using mutual proximity for novelty detection in audio music similarity , 2013 .

[5]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[6]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[7]  Michel Verleysen,et al.  The Concentration of Fractional Distances , 2007, IEEE Transactions on Knowledge and Data Engineering.

[8]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[9]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[10]  Michel Verleysen,et al.  Choosing the Metric: A Simple Model Approach , 2011, Meta-Learning in Computational Intelligence.

[11]  Alexandros Nanopoulos,et al.  How does high dimensionality affect collaborative filtering? , 2009, RecSys '09.

[12]  Markus Schedl,et al.  On the Use of Microblogging Posts for Similarity Estimation and Artist Labeling , 2010, ISMIR.

[13]  Dunja Mladenic,et al.  The Role of Hubness in Clustering High-Dimensional Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[14]  Markus Schedl,et al.  Local and global scaling reduce hubs in space , 2012, J. Mach. Learn. Res..

[15]  W. M. Wan,et al.  The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD , 2011 .