Automatic speaker tracking by camera using two-channel-based sound source localization

Purpose – The purpose of this paper is two‐fold. First, to deal with the problem of audio speaker localization and second, to deal with the problem of mobile camera control. The task of speaker localization consists of determining the position of the active speaker and the task of camera control consists of orienting a mobile camera towards that active speaker. These steps represent the main task of speaker tracking, which is the global purpose of the research work.Design/methodology/approach – In this approach, two‐channel‐based estimation of the speaker position is achieved by comparing the signals received by two cardioids microphones, which are placed the one against the other and separated by a fixed distance. The localization technique presented in this paper is inspired from the human ears, which act as two different sound observation points, enabling humans to estimate the direction of the speaking person with a good precision. Concerning the camera control part, the authors have conceived an auto...

[1]  Halim Sayoud,et al.  Speaker Discrimination on Broadcast News and Telephonic Calls Using a Fusion of Neural and Statistical Classifiers , 2009, Int. J. Mob. Comput. Multim. Commun..

[2]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[3]  Yannis Stylianou GMM-based multimodal biometric verification , 2005 .

[4]  Hyogon Kim,et al.  Speaker Localization on a Humanoid Robot ’ s Head using the TDOA-based Feature Matrix , 2008 .

[5]  E. Marchand Positionnement relatif d'une caméra et d'une source lumineuse en utilisant les gradients d'intensité de l'image , 2007 .

[6]  Henry Cox,et al.  Robust adaptive beamforming , 2005, IEEE Trans. Acoust. Speech Signal Process..

[7]  Felix Schaeffler,et al.  A methodological study into the linguistic dimensions of pitch range differences between German and English , 2008, Speech Prosody 2008.

[8]  Daniel R. Raichel The science and applications of acoustics , 2000 .

[9]  Ashok Kumar Tellakula Acoustic Source Localization Using Time Delay Estimation , 2007 .

[10]  Patrick Verlinde Contribution à la vérification multi-modale de l'identité en utilisant la fusion de décisions , 1999 .

[11]  Knud Rasmussen,et al.  Calculation methods for the physical properties of air used in the calibration of microphones , 1997 .

[12]  Guillaume Lathoud,et al.  Spatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays , 2006 .

[13]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Halim Sayoud,et al.  A new approach of speaker clustering based on the stereophonic differential energy , 2013, Int. J. Speech Technol..

[15]  Jwu-Sheng Hu,et al.  Sound source localization by microphone array on a mobile robot using eigen-structure based generalized cross correlation , 2008, 2008 IEEE Workshop on Advanced robotics and Its Social Impacts.

[16]  Gerald C. Lauchle Effect of turbulent boundary layer flow on measurement of acoustic pressure and intensity , 1984 .

[17]  Hyogon Kim,et al.  Speaker localization using the TDOA-based feature matrix for a humanoid robot , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[18]  Anoop Gupta,et al.  Automating camera management for lecture room environments , 2001, CHI.

[19]  S. Wermter,et al.  Robotic Sound-Source Localization and Tracking Using Interaural Time Difference and Cross-Correlation , 2004 .

[20]  Daniel Gatica-Perez,et al.  Speaker localization for microphone array-based ASR: the effects of accuracy on overlapping speech , 2006, ICMI '06.

[21]  Enzo Mumolo,et al.  Algorithms for acoustic localization based on microphone array in service robotics , 2003, Robotics Auton. Syst..