Clustering Music Recordings by Their Keys

Music key, a high level feature of musical audio, is an effective tool for structural analysis of musical works. This paper presents a novel unsupervised approach for clustering music recordings by their keys. Based on chroma-based features extracted from acoustic signals, an inter-recording distance metric which characterizes diversity of pitch distribution together with harmonic center of music pieces, is introduced to measure dissimilarities among musical features. Then, recordings are divided into categories via unsupervised clustering, where the best number of clusters can be determined automatically by minimizing estimated Rand Index. Any existing technique for key detection can then be employed to identify key assignment for each cluster. Empirical evaluation on a dataset of 91 pop songs illustrates an average cluster purity of 57.3% and a Rand Index of close to 50%, thus highlighting the possibility of integration with existing key identification techniques to improve accuracy, based on strong cross-correlation data available from this framework for input dataset.

[1]  R. Shepard Circularity in Judgments of Relative Pitch , 1964 .

[2]  Malcolm Slaney,et al.  Acoustic Chord Transcription and Key Extraction From Audio Using Key-Dependent HMMs Trained on Synthesized Audio , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[4]  Elaine Chew,et al.  The Spiral Array: An Algorithm for Determining Key Boundaries , 2002, ICMAI.

[5]  Hsin-Min Wang,et al.  Blind Clustering of Popular Music Recordings Based on Singer Voice Characteristics , 2004, Computer Music Journal.

[6]  Raymond N. J. Veldhuis,et al.  On the computation of the Kullback-Leibler measure for spectral distances , 2003, IEEE Trans. Speech Audio Process..

[7]  David Temperley,et al.  A Bayesian Approach to Key-Finding , 2002, ICMAI.

[8]  Marc Leman,et al.  Tree-based versus distance-based key recognition in musical audio , 2005, Soft Comput..

[9]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[10]  Emilia Gómez,et al.  Estimating The Tonality Of Polyphonic Audio Files: Cognitive Versus Machine Learning Modelling Strategies , 2004, ISMIR.

[11]  Steffen Pauws Extracting the Key from Music , 2006 .

[12]  Mohan S. Kankanhalli,et al.  Precise pitch profile feature extraction from musical audio for key detection , 2006, IEEE Transactions on Multimedia.

[13]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[14]  Barry Vercoe,et al.  Detection of Key Change in Classical Piano Music , 2005, ISMIR.

[15]  Hsin-Min Wang,et al.  Speaker Clustering Based on Minimum Rand Index , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  Ching-Hua Chuan,et al.  Fuzzy Analysis in Pitch-Class Determination for Polyphonic Audio Key Finding , 2005, ISMIR.

[17]  Raymond N. J. Veldhuis,et al.  Reducing audible spectral discontinuities , 2001, IEEE Trans. Speech Audio Process..

[18]  Özgür Izmirli,et al.  Tonal Similarity from Audio Using a Template Based Attractor Model , 2005, ISMIR.

[19]  Emile H. L. Aarts,et al.  Intelligent Algorithms in Ambient and Biomedical Computing , 2006 .

[20]  Ye Wang,et al.  Key, Chord, and Rhythm Tracking of Popular Music Recordings , 2005, Computer Music Journal.

[21]  Herbert Gish,et al.  Clustering speakers by their voices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).