Supervised Chorus Detection for Popular Music Using Convolutional Neural Network and Multi-Task Learning

This paper presents a novel supervised approach to detecting the chorus segments in popular music. Traditional approaches to this task are mostly unsupervised, with pipelines designed to target some quality that is assumed to define "chorusness," which usually means seeking the loudest or most frequently repeated sections. We propose to use a convolutional neural network with a multi-task learning objective, which simultaneously fits two temporal activation curves: one indicating "chorusness" as a function of time, and the other the location of the boundaries. We also propose a post-processing method that jointly takes into account the chorus and boundary predictions to produce binary output. In experiments using three datasets, we compare our system to a set of public implementations of other segmentation and chorus-detection algorithms, and find our approach performs significantly better.

[1]  Masataka Goto,et al.  A chorus section detection method for musical audio signals and its application to a music listening station , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Simon Dixon,et al.  Evaluation of the Audio Beat Tracking System BeatRoot , 2007 .

[3]  Jouni Paulus Improving Markov Model Based Music Piece Structure Labelling with Acoustic Information , 2010, ISMIR.

[4]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[5]  Oriol Nieto,et al.  Convex non-negative matrix factorization for automatic music structure identification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Music Structure Analysis Based on an LSTM-HSMM Hybrid Model , 2020, ISMIR.

[7]  Daniel P. W. Ellis,et al.  Analyzing Song Structure with Spectral Clustering , 2014, ISMIR.

[8]  Meinard Müller,et al.  Music Structure Analysis , 2021, Fundamentals of Music Processing.

[9]  Yi-Hsuan Yang,et al.  Pop Music Highlighter: Marking the Emotion Keypoints , 2018, Trans. Int. Soc. Music. Inf. Retr..

[10]  Xavier Serra,et al.  End-to-end Learning for Music Audio Tagging at Scale , 2017, ISMIR.

[11]  Peter Grosche,et al.  Unsupervised Detection of Music Boundaries by Time Series Structure Features , 2012, AAAI.

[12]  Masataka Goto,et al.  SmartMusicKIOSK: music listening station with chorus-search function , 2003, UIST '03.

[13]  Barry Vercoe,et al.  Music thumbnailing via structural analysis , 2003, ACM Multimedia.

[14]  Matthew E. P. Davies,et al.  The Harmonix Set: Beats, Downbeats, and Functional Segment Annotations of Western Popular Music , 2019, ISMIR.

[15]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[16]  Oriol Nieto,et al.  Systematic Exploration of Computational Music Structure Research , 2016, ISMIR.

[17]  Remco C. Veltkamp,et al.  An Analysis of Chorus Features in Popular Song , 2013, ISMIR.

[18]  Thomas Grill,et al.  Boundary Detection in Music Structure Analysis using Convolutional Neural Networks , 2014, ISMIR.

[19]  Akira Maezawa Music Boundary Detection Based on a Hybrid Deep Model of Novelty, Homogeneity, Repetition and Duration , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Ag Armin Kohlrausch,et al.  The perception of structural boundaries in melody lines of Western popular music , 2009 .

[21]  Peter Grosche,et al.  A Robust Fitness Measure for Capturing Repetitions in Music Recordings With Applications to Audio Thumbnailing , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  A. Eronen,et al.  CHORUS DETECTION WITH COMBINED USE OF MFCC AND CHROMA FEATURES AND IMAGE PROCESSING FILTERS , 2007 .

[23]  Thomas Grill,et al.  Music boundary detection using neural networks on spectrograms and self-similarity lag matrices , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[24]  Jordan B. L. Smith,et al.  Design and creation of a large-scale database of structural annotations , 2011, ISMIR.

[25]  Gautham J. Mysore,et al.  Structural segmentation with the Variable Markov Oracle and boundary adjustment , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Mohan S. Kankanhalli,et al.  Content-based music structure analysis with applications to music semantics understanding , 2004, MULTIMEDIA '04.