A chorus section detection method for musical audio signals and its application to a music listening station

This paper describes a method for obtaining a list of repeated chorus ("hook") sections in compact-disc recordings of popular music. The detection of chorus sections is essential for the computational modeling of music understanding and is useful in various applications, such as automatic chorus-preview/search functions in music listening stations, music browsers, or music retrieval systems. Most previous methods detected as a chorus a repeated section of a given length and had difficulty identifying both ends of a chorus section and dealing with modulations (key changes). By analyzing relationships between various repeated sections, our method, called RefraiD, can detect all the chorus sections in a song and estimate both ends of each section. It can also detect modulated chorus sections by introducing a perceptually motivated acoustic feature and a similarity that enable detection of a repeated chorus section even after modulation. Experimental results with a popular music database showed that this method correctly detected the chorus sections in 80 of 100 songs. This paper also describes an application of our method, a new music-playback interface for trial listening called SmartMusicKIOSK , which enables a listener to directly jump to and listen to the chorus section while viewing a graphical overview of the entire song structure. The results of implementing this application have demonstrated its usefulness

[1]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[2]  Masataka Goto,et al.  Development of the RWC Music Database , 2004 .

[3]  Keiji Hirata,et al.  Interactive Music Summarization based on GTTM , 2002, ISMIR.

[4]  Holger Crysandt,et al.  Temporal audio segmentation using MPEG-7 descriptors , 2003, IS&T/SPIE Electronic Imaging.

[5]  Xavier Rodet,et al.  Signal-based Music Structure Discovery for Music Audio Summary Generation , 2003, ICMC.

[6]  B. Moore An introduction to the psychology of hearing (5th ed.). , 1989 .

[7]  Makio Kashino,et al.  Basic hearing abilities and characteristics of musical pitch perception in absolute pitch possessors , 2004 .

[8]  Daniel P. W. Ellis,et al.  Chord segmentation and recognition using EM-trained hidden markov models , 2003, ISMIR.

[9]  R. Shepard Circularity in Judgments of Relative Pitch , 1964 .

[10]  Takuya Yoshioka,et al.  Automatic Chord Transcription with Concurrent Recognition of Chord Symbols and Boundaries , 2004, ISMIR.

[11]  Jonathan Foote,et al.  Automatic Music Summarization via Similarity Analysis , 2002, ISMIR.

[12]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[13]  Takuya Fujishima,et al.  Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[14]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[15]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[16]  Masataka Goto Music scene description project: Toward audio-based real-time music understanding , 2003, ISMIR.

[17]  Beth Logan,et al.  Music summarization using key phrases , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Jonathan Foote,et al.  Media segmentation using self-similarity decomposition , 2003, IS&T/SPIE Electronic Imaging.

[19]  E. Owens Introduction to the Psychology of Hearing , 1977 .

[20]  Mark Sandler,et al.  Finding Repeating Patterns in Acoustic Musical Signals : Applications for Audio Thumbnailing , 2002 .

[21]  Gregory H. Wakefield,et al.  Mathematical representation of joint time-chroma distributions , 1999, Optics & Photonics.

[22]  Ning Hu,et al.  Pattern Discovery Techniques for Music Audio , 2002, ISMIR.

[23]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[24]  Ning Hu,et al.  Discovering Musical Structure in Audio Recordings , 2002, ICMAI.

[25]  Yoichi Muraoka,et al.  RMCP: Remote Music Control Protocol - Design and Applications , 1997, ICMC.

[26]  Barry Vercoe,et al.  Structural analysis of musical signals for indexing and thumbnailing , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[27]  Matthew Cooper,et al.  Summarizing popular music via structural similarity analysis , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[28]  Xavier Rodet,et al.  Toward Automatic Music Audio Summary Generation from Signal Analysis , 2002, ISMIR.

[29]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..