A scale-free distribution of false positives for a large class of audio similarity measures

The "bag-of-frames" approach (BOF) to audio pattern recognition models signals as the long-term statistical distribution of their local spectral features, a prototypical implementation of which being Gaussian Mixture Models of Mel-Frequency Cepstrum Coefficients. This approach is the most predominant paradigm to extract high-level descriptions from music signals, such as their instrument, genre or mood, and can also be used to compute direct timbre similarity between songs. However, a recent study by the authors shows that this class of algorithms when applied to music tends to create false positives which are mostly always the same songs regardless of the query. In other words, with such models, there exist songs-which we call hubs-which are irrelevantly close to very many songs. This paper reports on a number of experiments, using implementations on large music databases, aiming at better understanding the nature and causes of such hub songs. We introduce two measures of "hubness", the number of n-occurrences and the mean neighbor angle. We find that in typical music databases, hubs are distributed along a scale-free distribution: non-hub songs are extremely common, and large hubs are extremely rare-but they exist. Moreover, we establish that hubs are not a property of a given modelling strategy (i.e. static vs dynamic, parametric vs non-parametric, etc.) but rather tend to occur with any type of model, however only for data with a given amount of "heterogeneity" (to be defined). This suggests that the existence of hubs could be an important phenomenon which generalizes over the specific problem of music modelling, and indicates a general structural property of an important class of pattern recognition algorithms.

[1]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[2]  Hongyu Zhao,et al.  Are scale-free networks robust to measurement errors? , 2005, BMC Bioinformatics.

[3]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[4]  François Pachet,et al.  The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. , 2007, The Journal of the Acoustical Society of America.

[5]  Douglas A. Reynolds,et al.  SHEEP, GOATS, LAMBS and WOLVES A Statistical Analysis of Speaker Performance in the NIST 1998 Speaker Recognition Evaluation , 1998 .

[6]  François Pachet,et al.  "The way it Sounds": timbre models for analysis and retrieval of music signals , 2005, IEEE Transactions on Multimedia.

[7]  Elias Pampalk,et al.  Computational Models of Music Similarity and their Application in Music Information Retrieval , 2006 .

[8]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[9]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[10]  François Pachet,et al.  Popular music access: The Sony music browser , 2004, J. Assoc. Inf. Sci. Technol..

[11]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[12]  Craig I. Watson,et al.  The myth of goats :: how many people have fingerprints that are hard to match? , 2005 .

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  G. Soete,et al.  Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes , 1995, Psychological research.

[15]  David M. Raup,et al.  How Nature Works: The Science of Self-Organized Criticality , 1997 .

[16]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[17]  George Tzanetakis,et al.  Automatic Musical Genre Classification of Audio Signals , 2001, ISMIR.

[18]  Tomi Kinnunen,et al.  Is speech data clustered? - statistical analysis of cepstral features , 2001, INTERSPEECH.

[19]  Markus Koppenberger,et al.  Topology of music recommendation networks. , 2006, Chaos.

[20]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[21]  M. V. Velzen,et al.  Self-organizing maps , 2007 .