Virtual speaker tracking by camera using a sound source localisation with two microphones

Our research work deals with the problem of automatic speaker tracking by camera. Such tracking systems do exist nowadays, but they suffer from a number of mechanical problems. To overcome these problems, we thought of employing a virtual tracking system using a fixed camera that does not require any mechanical part. But, is it possible to track a moving speaker with a fixed camera? If the task is already difficult enough with a mobile camera, how difficult would it be with a fixed camera? Trying to find a solution to the problem, we have proposed and conceived a virtual tracking system, which is able to ensure the required task by using only two cardioid microphones and a classic video camera. In this virtual tracking system, the task of speaker tracking is ensured by the orientation of the region of interest ROI of the camera towards the active speaker; we have called this method the virtual region of interest VROI based technique. Experiments show the good performance of the new virtual technique.

[1]  Halim Sayoud,et al.  Automatic Speaker Localization and Tracking: Using a Fusion of the Filtered Correlation with the Energy Differential , 2010, Int. J. Mob. Comput. Multim. Commun..

[2]  Brent Christopher Kirkwood Acoustic Source Localization Using Time-Delay Estimation , 2003 .

[3]  Daniel Gatica-Perez,et al.  Speaker localization for microphone array-based ASR: the effects of accuracy on overlapping speech , 2006, ICMI '06.

[4]  Felix Schaeffler,et al.  A methodological study into the linguistic dimensions of pitch range differences between German and English , 2008, Speech Prosody 2008.

[5]  Siham Ouamour,et al.  Speaker localization using stereo-based sound source localization , 2011, International Workshop on Systems, Signal Processing and their Applications, WOSSPA.

[6]  Henry Cox,et al.  Robust adaptive beamforming , 2005, IEEE Trans. Acoust. Speech Signal Process..

[7]  H. Sayoud,et al.  A new method of speaker localization using the filtered correlation , 2010, 2010 The 2nd International Conference on Industrial Mechatronics and Automation.

[8]  Guillaume Lathoud,et al.  Spatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays , 2006 .

[9]  B. S. Manjunath,et al.  Region of interest extraction and virtual camera control based on panoramic video capturing , 2005, IEEE Transactions on Multimedia.

[10]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Yannis Stylianou GMM-based multimodal biometric verification , 2005 .

[12]  Hyogon Kim,et al.  Speaker localization using the TDOA-based feature matrix for a humanoid robot , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[13]  Siham Ouamour,et al.  Automatic speaker tracking by camera using two-channel-based sound source localization , 2011, Int. J. Intell. Comput. Cybern..

[14]  Anoop Gupta,et al.  Automating camera management for lecture room environments , 2001, CHI.