Audio-visual beat tracking based on a state-space model for a music robot dancing with humans

This paper presents an audio-visual beat-tracking method for an entertainment robot that can dance in synchronization with music and human dancers. Conventional music robots have focused on either music audio signals or dancing movements of humans for detecting and predicting beat times in real time. Since a robot needs to record music audio signals by using its own microphones, however, the signals are severely contaminated with loud environmental noise and reverberant sounds. Moreover, it is difficult to visually detect beat times from real complicated dancing movements that exhibit weaker repetitive characteristics than music audio signals do. To solve these problems, we propose a state-space model that integrates both audio and visual information in a probabilistic manner. At each frame, the method extracts acoustic features (audio tempos and onset likelihoods) from music audio signals and extracts skeleton features from movements of a human dancer. The current tempo and the next beat time are then estimated from those observed features by using a particle filter. Experimental results showed that the proposed multi-modal method using a depth sensor (Kinect) for extracting skeleton features outperformed conventional mono-modal methods by 0.20 (F measure) in terms of beat-tracking accuracy in a noisy and reverberant environment.

[1]  Atsushi Nakazawa,et al.  Rhythmic motion analysis using motion capture and musical information , 2003, Proceedings of IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, MFI2003..

[2]  Masataka Goto,et al.  An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds , 2001 .

[3]  Hiroshi G. Okuno,et al.  Design and Implementation of Robot Audition System 'HARK' — Open Source Software for Listening to Three Simultaneous Speakers , 2010, Adv. Robotics.

[4]  D. Ellis Beat Tracking by Dynamic Programming , 2007 .

[5]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[6]  Shuuji Kajita,et al.  Cybernetic human HRP-4C , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[7]  Atsuo Takanishi,et al.  Development of a real-time instrument tracking system for enabling the musical interaction with the Waseda Flutist Robot , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Luís Paulo Reis,et al.  Live assessment of beat tracking for robot audition , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  David Ross Berman AVISARME: Audio Visual Synchronization Algorithm for a Robotic Musician Ensemble , 2012 .

[10]  Hiroshi G. Okuno,et al.  A beat-tracking robot for human-robot interaction and its evaluation , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[11]  Matthew E. P. Davies,et al.  Real-time beat-synchronous analysis of musical audio , 2009 .

[12]  Tetsuya Ogata,et al.  Robot musical accompaniment: integrating audio and visual cues for real-time synchronization with a human flutist , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Ruth Rasch,et al.  Synchronization in performed ensemble music , 1979 .

[14]  Tetsuya Ogata,et al.  Particle-filter based audio-visual beat-tracking for music robot ensemble with human guitarist , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Mitsuru Endo,et al.  Partner Ballroom Dance Robot -PBDR- , 2008 .

[16]  Simon Dixon,et al.  Evaluation of the Audio Beat Tracking System BeatRoot , 2007 .

[17]  Yoshihiro Kusuda,et al.  Toyota's violin-playing robot , 2008, Ind. Robot.

[18]  Matthew E. P. Davies,et al.  Context-Dependent Beat Tracking of Musical Audio , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Wei-Ta Chu,et al.  Rhythm of Motion Extraction and Rhythm-Based Cross-Media Alignment for Dance Videos , 2012, IEEE Transactions on Multimedia.

[20]  Atsuo Takanishi,et al.  Development of a aural real-time rhythmical and harmonic tracking to enable the musical interaction with the Waseda Flutist Robot , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Tetsuya Ogata,et al.  Exploiting known sound source signals to improve ICA-based robot audition in speech separation and recognition , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Gil Weinberg,et al.  Interactive jamming with Shimon: A social robotic musician , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).