Modeling concept dynamics for large scale music search

Continuing advances in data storage and communication technologies have led to an explosive growth in digital music collections. To cope with their increasing scale, we need effective Music Information Retrieval (MIR) capabilities like tagging, concept search and clustering. Integral to MIR is a framework for modelling music documents and generating discriminative signatures for them. In this paper, we introduce a multimodal, layered learning framework called DMCM. Distinguished from the existing approaches that encode music as an ensemble of order-less feature vectors, our framework extracts from each music document a variety of acoustic features, and translates them into low-level encodings over the temporal dimension. From them, DMCM elucidates the concept dynamics in the music document, representing them with a novel music signature scheme called Stochastic Music Concept Histogram (SMCH) that captures the probability distribution over all the concepts. Experiment results with two large music collections confirm the advantages of the proposed framework over existing methods on various MIR tasks.

[1]  Daniel P. W. Ellis,et al.  Chord segmentation and recognition using EM-trained hidden markov models , 2003, ISMIR.

[2]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[3]  Changshui Zhang,et al.  Content-Based Information Fusion for Semi-Supervised Music Genre Classification , 2008, IEEE Transactions on Multimedia.

[4]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[5]  Anne H. H. Ngu,et al.  Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Combination , 2006, IEEE Transactions on Multimedia.

[6]  David G. Stork,et al.  Pattern Classification , 1973 .

[7]  Rahul Telang,et al.  The Effect of Digital Sharing Technologies on Music Markets: A Survival Analysis of Albums on Ranking Charts , 2007, Manag. Sci..

[8]  Lie Lu,et al.  Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Antoni B. Chan,et al.  Time Series Models for Semantic Music Annotation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Gert R. G. Lanckriet,et al.  Combining audio content and social context for semantic music discovery , 2009, SIGIR.

[11]  Xiangyang Xue,et al.  Robust audio identification for MP3 popular music , 2010, SIGIR '10.

[12]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[13]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[14]  Kian-Lee Tan,et al.  Towards efficient automated singer identification in large music databases , 2006, SIGIR.

[15]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[16]  James R. Marsden,et al.  Consumer Search and Retailer Strategies in the Presence of Online Music Sharing , 2006, J. Manag. Inf. Syst..

[17]  Lie Lu,et al.  Content-based audio segmentation using support vector machines , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[18]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[19]  Bingjun Zhang,et al.  CompositeMap: a novel framework for music similarity measure , 2009, SIGIR.

[20]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[21]  Stphane Mallat,et al.  A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way , 2008 .

[22]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  S. Mallat A wavelet tour of signal processing , 1998 .

[24]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25]  John Z. Zhang,et al.  Enhancing multi-label music genre classification through ensemble techniques , 2011, SIGIR.

[26]  George Tzanetakis,et al.  Pitch Histograms in Audio and Symbolic Music Information Retrieval , 2003, ISMIR.

[27]  Leslie G. Valiant,et al.  The Complexity of Computing the Permanent , 1979, Theor. Comput. Sci..

[28]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[29]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[30]  Gert R. G. Lanckriet,et al.  Modeling music and words using a multi-class naïve Bayes approach , 2006, ISMIR.

[31]  U. Nam Addressing the Same but different-different but similar problem in automatic music classification , 2001 .

[32]  Gert R. G. Lanckriet,et al.  Towards musical query-by-semantic-description using the CAL500 data set , 2007, SIGIR.

[33]  Shuicheng Yan,et al.  Effective music tagging through advanced statistical modeling , 2010, SIGIR.

[34]  Laurent Daudet,et al.  Transients modelling by pruned wavelet trees , 2001, ICMC.

[35]  Riccardo Miotto,et al.  A Probabilistic Model to Combine Tags and Acoustic Similarity for Music Retrieval , 2012, TOIS.

[36]  Thierry Bertin-Mahieux,et al.  Autotagger: A Model for Predicting Social Tags from Acoustic Features on Large Music Databases , 2008 .

[37]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[38]  Thierry Bertin-Mahieux,et al.  Automatic Generation of Social Tags for Music Recommendation , 2007, NIPS.