Hubness as a Case of Technical Algorithmic Bias in Music Recommendation

This paper tries to bring the problem of technical algorithmic bias to the attention of the high-dimensional data mining community. A system suffering from algorithmic bias results in systematic unfair treatment of certain users or data, with technical algorithmic bias arising specifically from technical constraints. We illustrate this problem, which so far has been neglected in high-dimensional data mining, for a real world music recommendation system. Due to a problem of measuring distances in high dimensional spaces, songs closer to the center of all data are recommended over and over again, while songs far from the center are not recommended at all. We show that these so-called hub songs do not carry a specific semantic meaning and that deleting them from the data base promotes other songs to hub songs being recommended disturbingly often as a consequence. We argue that it is the ethical responsibility of data mining researchers to care about the fairness of their algorithms in high-dimensional spaces.

[1]  Eamonn J. Keogh Nearest Neighbor , 2010, Encyclopedia of Machine Learning.

[2]  Arthur Flexer,et al.  A comprehensive empirical comparison of hubness reduction in high-dimensional spaces , 2018, Knowledge and Information Systems.

[3]  Xavier Serra,et al.  Roadmap for Music Information ReSearch , 2013 .

[4]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[5]  Xavier Serra,et al.  A sound analysis/synthesis system based on a deterministic plus stochastic decomposition , 1990 .

[6]  Tim Pohle,et al.  Combining Features Reduces Hubness in Audio Similarity , 2010, ISMIR.

[7]  Yann LeCun,et al.  Feature learning and deep architectures: new directions for music informatics , 2013, Journal of Intelligent Information Systems.

[8]  Markus Schedl,et al.  Using Mutual Proximity to Improve Content-Based Audio Similarity , 2011, ISMIR.

[9]  Zdenek Prusa,et al.  A Noniterative Method for Reconstruction of Phase From STFT Magnitude , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[11]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[12]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[13]  Arthur Flexer,et al.  HUBNESS-AWARE OUTLIER DETECTION FOR MUSIC GENRE RECOGNITION , 2016 .

[14]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[15]  Kenji Fukumizu,et al.  Localized Centering: Reducing Hubness in Large-Sample Data , 2015, AAAI.

[16]  Georgiana Dinu,et al.  Improving zero-shot learning by mitigating the hubness problem , 2014, ICLR.

[17]  Julius O. Smith,et al.  Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition , 1990 .

[18]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[19]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[20]  Xavier Serra A Multicultural Approach in Music Information Research , 2011, ISMIR.

[21]  Dunja Mladenic,et al.  The Role of Hubness in Clustering High-Dimensional Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[22]  Arthur Flexer,et al.  Improving Visualization of High-Dimensional Music Similarity Spaces , 2015, ISMIR.

[23]  François Pachet,et al.  A scale-free distribution of false positives for a large class of audio similarity measures , 2008, Pattern Recognit..

[24]  G. Widmer Mirage - High-Performance Music Similarity Computation and Automatic Playlist Generation , 2007 .

[25]  Fabrizio Angiulli,et al.  On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct and Reverse Nearest Neighbors, and Hubness , 2017, J. Mach. Learn. Res..

[26]  Meinard Mller,et al.  Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications , 2015 .

[27]  Bob L. Sturm A Simple Method to Determine if a Music Information Retrieval System is a “Horse” , 2014, IEEE Transactions on Multimedia.

[28]  Ata Kabán,et al.  Non-parametric detection of meaningless distances in high dimensional data , 2011, Statistics and Computing.

[29]  Markus Schedl,et al.  Local and global scaling reduce hubs in space , 2012, J. Mach. Learn. Res..

[30]  Helen Nissenbaum,et al.  Bias in computer systems , 1996, TOIS.

[31]  Franco Turini,et al.  Discrimination-aware data mining , 2008, KDD.

[32]  Michel Verleysen,et al.  The Concentration of Fractional Distances , 2007, IEEE Transactions on Knowledge and Data Engineering.

[33]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[34]  Antonino Staiano,et al.  Intrinsic dimension estimation: Advances and open problems , 2016, Inf. Sci..

[35]  Arthur Flexer,et al.  A MIREX Meta-analysis of Hubness in Audio Music Similarity , 2012, ISMIR.

[36]  Arthur Flexer An Empirical Analysis of Hubness in Unsupervised Distance-Based Outlier Detection , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[37]  Arthur Flexer,et al.  Mutual proximity graphs for improved reachability in music recommendation , 2017, Journal of new music research.

[38]  Irène Waldspurger,et al.  Phase Retrieval for Wavelet Transforms , 2015, IEEE Transactions on Information Theory.

[39]  Gerhard Widmer,et al.  Islands of Gaussians: The Self Organizing Map and Gaussian Music Similarity Features , 2010, ISMIR.

[40]  Dunja Mladenic,et al.  Hubness-Aware Shared Neighbor Distances for High-Dimensional k-Nearest Neighbor Classification , 2012, HAIS.

[41]  Alexandros Nanopoulos,et al.  Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection , 2015, IEEE Transactions on Knowledge and Data Engineering.

[42]  Òscar Celma,et al.  Music Recommendation and Discovery - The Long Tail, Long Fail, and Long Play in the Digital Music Space , 2010 .

[43]  Arthur Flexer,et al.  FM4 SOUNDPARK AUDIO-BASED MUSIC RECOMMENDATION IN EVERYDAY USE , 2009 .

[44]  Arthur Flexer,et al.  Choosing ℓp norms in high-dimensional spaces based on hub analysis , 2015, Neurocomputing.

[45]  Nicolas Sturmel,et al.  SIGNAL RECONSTRUCTION FROM STFT MAGNITUDE : A STATE OF THE ART , 2011 .

[46]  Yuji Matsumoto,et al.  Ridge Regression, Hubness, and Zero-Shot Learning , 2015, ECML/PKDD.

[47]  Beth Logan,et al.  Music Recommendation from Song Sets , 2004, ISMIR.

[48]  Arthur Flexer,et al.  The unbalancing effect of hubs on K-medoids clustering in high-dimensional spaces , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[49]  Alexandros Nanopoulos,et al.  Looking Through the "Glass Ceiling": A Conceptual Framework for the Problems of Spectral Similarity , 2010, ISMIR.

[50]  Daniel P. W. Ellis,et al.  Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[51]  Josep Domingo-Ferrer,et al.  A Methodology for Direct and Indirect Discrimination Prevention in Data Mining , 2013, IEEE Transactions on Knowledge and Data Engineering.

[52]  Bob L. Sturm,et al.  Ethical Dimensions of Music Information Retrieval Technology , 2018, Trans. Int. Soc. Music. Inf. Retr..