Convex non-negative matrix factorization for automatic music structure identification

We propose a novel and fast approach to discover structure in western popular music by using a specific type of matrix factorization that adds a convex constrain to obtain a decomposition that can be interpreted as a set of weighted cluster centroids. We show that these centroids capture the different sections of a musical piece (e.g. verse, chorus) in a more consistent and efficient way than classic non-negative matrix factorization. This technique is capable of identifying the boundaries of the sections and then grouping them into different clusters. Additionally, we evaluate this method on two different datasets and show that it is competitive compared to other music segmentation techniques, outperforming other matrix factorization methods.

[1]  Meinard Müller,et al.  Transposition-Invariant Self-Similarity Matrices , 2007, ISMIR.

[2]  Thomas Sikora,et al.  Music Structure Discovery in Popular Music using Non-negative Matrix Factorization , 2010, ISMIR.

[3]  Hsin-Min Wang,et al.  Learning the Similarity of Audio Music in Bag-of-frames Representation from Tagged Music Data , 2011, ISMIR.

[4]  Christian Bauckhage,et al.  Convex Non-negative Matrix Factorization in the Wild , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[5]  Masataka Goto,et al.  A chorus-section detecting method for musical audio signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Masataka Goto,et al.  A Supervised Approach for Detecting Boundaries in Music Using Difference Features and Boosting , 2007, ISMIR.

[7]  Simon Dixon,et al.  10 th International Society for Music Information Retrieval Conference ( ISMIR 2009 ) USING MUSICAL STRUCTURE TO ENHANCE AUTOMATIC CHORD TRANSCRIPTION , 2009 .

[8]  Mark B. Sandler,et al.  Structural Segmentation of Musical Audio by Constrained Clustering , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Mark B. Sandler,et al.  A Markov-Chain Monte-Carlo Approach to Musical Audio Segmentation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[11]  C.-C. Jay Kuo,et al.  Similarity matrix processing for music structure analysis , 2006, AMCMM '06.

[12]  Xavier Rodet,et al.  Toward Automatic Music Audio Summary Generation from Signal Analysis , 2002, ISMIR.

[13]  Haesun Park,et al.  Sparse Nonnegative Matrix Factorization for Clustering , 2008 .

[14]  Meinard Müller,et al.  Audio-based Music Structure Analysis , 2010 .

[15]  Perfecto Herrera,et al.  Semantic Segmentation of Music audio Contents , 2005, ICMC.

[16]  Markus Schedl,et al.  Using Mutual Proximity to Improve Content-Based Audio Similarity , 2011, ISMIR.

[17]  Jürgen Kurths,et al.  Recurrence plots for the analysis of complex systems , 2009 .

[18]  Ron J. Weiss,et al.  Unsupervised Discovery of Temporal Structure in Music , 2011, IEEE Journal of Selected Topics in Signal Processing.

[19]  Jordan B. L. Smith,et al.  Design and creation of a large-scale database of structural annotations , 2011, ISMIR.

[20]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Hanna M. Lukashevich Towards Quantitative Measures of Evaluating Song Segmentation , 2008, ISMIR.

[22]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.