Music scene description project: Toward audio-based real-time music understanding

Music understanding is an important component of audio‐based interactive music systems. A real‐time music scene description system for the computational modeling of music understanding is proposed. This research is based on the assumption that a listener understands music without deriving musical scores or even fully segregating signals. In keeping with this assumption, our music scene description system produces intuitive descriptions of music, such as the beat structure and the melody and bass lines. Two real‐time subsystems have been developed, a beat‐tracking subsystem and a melody‐and‐bass detection subsystem, which can deal with real‐world monaural audio signals sampled from popular‐music CDs. The beat‐tracking subsystem recognizes a hierarchical beat structure comprising the quarter‐note, half‐note, and measure levels by using three kinds of musical knowledge: of onset times, of chord changes, and of drum patterns. The melody‐and‐bass detection subsystem estimates the F0 (fundamental frequency) of melody and bass lines by using a predominant‐F0 estimation method called PreFEst, which does not rely on the F0’s unreliable frequency component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range. Several applications of music understanding are described, including a beat‐driven, real‐time computer graphics and lighting controller.

[1]  Masataka Goto,et al.  SmartMusicKIOSK: music listening station with chorus-search function , 2003, UIST '03.

[2]  Masataka Goto A predominant-F/sub 0/ estimation method for CD recordings: MAP estimation using EM algorithm for adaptive tone models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Ning Hu,et al.  Pattern Discovery Techniques for Music Audio , 2002, ISMIR.

[4]  Seiji Inokuchi,et al.  The Kansei Music System , 1989 .

[5]  Masataka Goto,et al.  A chorus-section detecting method for musical audio signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Masataka Goto,et al.  An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds , 2001 .

[7]  Yoichi Muraoka,et al.  Real-time beat tracking for drumless audio signals: Chord change detection for musical decisions , 1999, Speech Commun..

[8]  Xavier Rodet,et al.  Toward Automatic Music Audio Summary Generation from Signal Analysis , 2002, ISMIR.

[9]  Barry Vercoe,et al.  Music-listening systems , 2000 .

[10]  Jonathan Foote,et al.  Automatic Music Summarization via Similarity Analysis , 2002, ISMIR.

[11]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[12]  Anssi Klapuri,et al.  Automatic transcription of musical recordings , 2001 .

[13]  Beth Logan,et al.  Music summarization using key phrases , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[14]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[15]  Michael A. Casey,et al.  Separation of Mixed Audio Sources By Independent Subspace Analysis , 2000, ICMC.