HUBNESS-AWARE OUTLIER DETECTION FOR MUSIC GENRE RECOGNITION

Outlier detection is the task of automatic identification of unknown data not covered by training data (e.g. a new genre in genre recognition). We explore outlier detection in the presence of hubs and anti-hubs, i.e. data objects which appear to be either very close or very far from most other data due to a problem of measuring distances in high dimensions. We compare a classic distance based method to two new approaches, which have been designed to counter the negative effects of hubness, on two standard music genre data sets. We demonstrate that anti-hubs are responsible for many detection errors and that this can be improved by using a hubness-aware approach.

[1]  Arthur Flexer,et al.  Centering Versus Scaling for Hubness Reduction , 2016, ICANN.

[2]  Bob L. Sturm Music genre recognition with risk and rejection , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[3]  Markus Schedl,et al.  Local and global scaling reduce hubs in space , 2012, J. Mach. Learn. Res..

[4]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[5]  Arthur Flexer,et al.  Using mutual proximity for novelty detection in audio music similarity , 2013 .

[6]  Kaare Brandt Petersen,et al.  Learning and clean-up in a large scale music database , 2007, 2007 15th European Signal Processing Conference.

[7]  Bob L. Sturm An analysis of the GTZAN music genre dataset , 2012, MIRUM '12.

[8]  Arthur Flexer,et al.  A Case for Hubness Removal in High-Dimensional Multimedia Retrieval , 2014, ECIR.

[9]  Arthur Flexer,et al.  A MIREX Meta-analysis of Hubness in Audio Music Similarity , 2012, ISMIR.

[10]  Arthur Flexer,et al.  Improving Visualization of High-Dimensional Music Similarity Spaces , 2015, ISMIR.

[11]  Gerhard Widmer,et al.  Novelty Detection Based on Spectral Similarity of Songs , 2005, ISMIR.

[12]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[13]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[14]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[15]  Arthur Flexer,et al.  The unbalancing effect of hubs on K-medoids clustering in high-dimensional spaces , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[16]  Dunja Mladenic,et al.  The Role of Hubness in Clustering High-Dimensional Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[17]  Arthur Flexer,et al.  Effects of Album and Artist Filters in Audio Similarity Computed for Very Large Music Databases , 2010, Computer Music Journal.

[18]  Michel Verleysen,et al.  The Concentration of Fractional Distances , 2007, IEEE Transactions on Knowledge and Data Engineering.

[19]  Tim Pohle,et al.  Combining Features Reduces Hubness in Audio Similarity , 2010, ISMIR.

[20]  G. Peeters,et al.  GMM SUPERVECTOR FOR CONTENT BASED MUSIC SIMILARITY , 2011 .

[21]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[22]  Alexandros Nanopoulos,et al.  Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection , 2015, IEEE Transactions on Knowledge and Data Engineering.

[23]  Daniel P. W. Ellis,et al.  Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[24]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[25]  Alexandros Nanopoulos,et al.  Looking Through the "Glass Ceiling": A Conceptual Framework for the Problems of Spectral Similarity , 2010, ISMIR.

[26]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[27]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..