Boundary Detection in Music Structure Analysis using Convolutional Neural Networks

The recognition of boundaries, e.g., between chorus and verse, is an important task in music structure analysis. The goal is to automatically detect such boundaries in audio signals so that the results are close to human annotation. In this work, we apply Convolutional Neural Networks to the task, trained directly on mel-scaled magnitude spectrograms. On a representative subset of the SALAMI structural annotation dataset, our method outperforms current techniques in terms of boundary retrievalF -measure at different temporal tolerances: We advance the state-of-the-art from 0:33 to 0:46 for tolerances of0:5 seconds, and from 0:52 to 0:62 for tolerances of3 seconds. As the algorithm is trained on annotated audio data without the need of expert knowledge, we expect it to be easily adaptable to changed annotation guidelines and also to related tasks such as the detection of song transitions.

[1]  Antoni B. Chan,et al.  Automatic Musical Pattern Feature Extraction Using Convolutional Neural Network , 2010 .

[2]  Jonathan Foote,et al.  Media segmentation using self-similarity decomposition , 2003, IS&T/SPIE Electronic Imaging.

[3]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[4]  A. Klapuri,et al.  ACOUSTIC FEATURES FOR MUSIC PIECE STRUCTURE ANALYSIS , 2011 .

[5]  Jordan B. L. Smith,et al.  Design and creation of a large-scale database of structural annotations , 2011, ISMIR.

[6]  Mark Sandler,et al.  Segmentation of Musical Signals Using Hidden Markov Models. , 2001 .

[7]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[8]  Lie Lu,et al.  Repeating pattern discovery and structure analysis from acoustic music data , 2004, MIR '04.

[9]  Benjamin Schrauwen,et al.  Audio-based Music Classification with a Pretrained Convolutional Network , 2011, ISMIR.

[10]  Daniel P. W. Ellis,et al.  Learning to segment songs with ordinal linear discriminant analysis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[12]  Sebastian Böck,et al.  Improved musical onset detection with Convolutional Neural Networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Juan Pablo Bello,et al.  Rethinking Automatic Chord Recognition with Convolutional Neural Networks , 2012, 2012 11th International Conference on Machine Learning and Applications.

[14]  Anssi Klapuri,et al.  Music Structure Analysis Using a Probabilistic Fitness Measure and a Greedy Search Algorithm , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  A. Klapuri,et al.  Music structure analysis by finding repeated parts , 2006, AMCMM '06.

[16]  Douglas Eck,et al.  Temporal Pooling and Multiscale Learning for Automatic Annotation and Ranking of Music Audio , 2011, ISMIR.

[17]  Peter Grosche,et al.  Unsupervised Detection of Music Boundaries by Time Series Structure Features , 2012, AAAI.

[18]  Beth Logan,et al.  Music summarization using key phrases , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19]  Masataka Goto,et al.  A Supervised Approach for Detecting Boundaries in Music Using Difference Features and Boosting , 2007, ISMIR.

[20]  Mark B. Sandler,et al.  Structural Segmentation of Musical Audio by Constrained Clustering , 2008, IEEE Transactions on Audio, Speech, and Language Processing.