Face Recognition with Machine Learning in OpenCV_ Fusion of the results with the Localization Data of an Acoustic Camera for Speaker Identification

This contribution gives an overview of face recogni-tion algorithms, their implementation and practical uses. First, a training set of different persons' faces has to be collected and used to train a face recognizer. The resulting face model can be utilized to classify people in specific individuals or unknowns. After tracking the recognized face and estimating the acoustic sound source's position, both can be combined to give detailed information about possible speakers and if they are talking or not. This leads to a precise real-time description of the situation, which can be used for further applications, e.g. for multi-channel speech enhancement by adaptive beamformers.

[1]  Daniel Lélis Baggio,et al.  Mastering OpenCV with Practical Computer Vision Projects , 2012 .

[2]  Sverre Holm,et al.  Multi-speaker voice activity detection using a camera-assisted microphone array , 2016, 2016 International Conference on Systems, Signals and Image Processing (IWSSIP).

[3]  Joseph H. DiBiase A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays , 2000 .

[4]  Asit K. Datta,et al.  Face Detection and Recognition: Theory and Practice , 2015 .

[5]  Carlos Busso,et al.  Smart room: participant and speaker localization and identification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Joachim Sigl,et al.  Acoustic Imaging of Sound Sources with a student-designed Acoustic Camera , 2007 .

[7]  Ziyou Xiong,et al.  Face Recognition Applications , 2011, Handbook of Face Recognition.

[8]  I. Lopez Arteaga,et al.  On spatial sampling and aliasing in acoustic imaging , 2005 .

[9]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[11]  B. K. Julsing,et al.  Face Recognition with Local Binary Patterns , 2012 .

[12]  Florian Krebs,et al.  Circular Microphone Array Based Beamforming and Source Localization on Reconfigurable Hardware , .

[13]  Janusz Konrad,et al.  Towards privacy-preserving activity recognition using extremely low temporal and spatial resolution cameras , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Carlos Busso,et al.  Multimodal Meeting Monitoring: Improvements on Speaker Tracking and Segmentation through a Modified Mixture Particle Filter , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[15]  Carlos Busso,et al.  Real-Time Monitoring of Participants' Interaction in a Meeting using Audio-Visual Sensors , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  B. Heisele Face Detection , 2001 .

[17]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.