A robot listens to music and counts its beats aloud by separating music from counting voice

This paper presents a beat-counting robot that can count musical beats aloud, i.e., speak ldquoone, two, three, four, one, two, ...rdquo along music, while listening to music by using its own ears. Music-understanding robots that interact with humans should be able not only to recognize music internally, but also to express their own internal states. To develop our beat-counting robot, we have tackled three issues: (1) recognition of hierarchical beat structures, (2) expression of these structures by counting beats, and (3) suppression of counting voice (self-generated sound) in sound mixtures recorded by ears. The main issue is (3) because the interference of counting voice in music causes the decrease of the beat recognition accuracy. So we designed the architecture for music-understanding robot that is capable of dealing with the issue of self-generated sounds. To solve these issues, we took the following approaches: (1) beat structure prediction based on musical knowledge on chords and drums, (2) speed control of counting voice according to music tempo via a vocoder called STRAIGHT, and (3) semi-blind separation of sound mixtures into music and counting voice via an adaptive filter based on ICA (independent component analysis) that uses the waveform of the counting voice as a prior knowledge. Experimental result showed that suppressing robotpsilas own voice improved music recognition capability.

[1]  Marek P. Michalowski,et al.  Rhythmic Synchrony for Attractive Human-Robot Interaction , 2007 .

[2]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[3]  Shinya Kotosaka,et al.  Synchronized Robot Drumming by Neural Oscillator , 2001 .

[4]  W. Levelt Speaking: From Intention to Articulation , 1990 .

[5]  Hiroshi G. Okuno,et al.  Robot Audition using an Adaptive Filter Based on Independent Component Analysis , 2008 .

[6]  Shun-ichi AMARIyy,et al.  NATURAL GRADIENT LEARNING WITH A NONHOLONOMIC CONSTRAINT FOR BLIND DECONVOLUTION OF MULTIPLE CHANNELS , 1999 .

[7]  Masataka Goto,et al.  An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds , 2001 .

[8]  Tetsuya Ogata,et al.  A biped robot that keeps steps in time with musical beats while listening to music with its own ears , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Hideki Kawahara,et al.  STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds , 2006 .

[10]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[11]  Hiroshi Sawada,et al.  Polar coordinate based nonlinear function for frequency-domain blind source separation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.