Speaker localization in noisy environments using steered response voice power

Many devices, including smart TVs and humanoid robots, can be operated through speech interface. Since a user can interact with such a device at a distance, speech-operated devices must be able to process speech signals from a distance. Although many methods exist to localize speakers via sound source localization, it is very difficult to reliably find the location of a speaker in a noisy environment. In particular, conventional sound source localization methods only find the loudest sound source within a given area, and such a sound source may not necessarily be related to human speech. This can be problematic in real environments where loud noises frequently occur, and the performance of speech-based interfaces for a variety of devices could be negatively impacted as a result. In this paper, a new speaker localization method is proposed. It identifies the location associated with the maximum voice power from all candidate locations. The proposed method is tested under a variety of conditions using both simulation data and real data, and the results indicate that the performance of the proposed method is superior to that of a conventional algorithm for various types of noises1.

[1]  Volker Willert,et al.  A Probabilistic Model for Binaural Sound Localization , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[3]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[4]  Dongsuk Yook,et al.  Robust Voice Activity Detection Using the Spectral Peaks of Vowel Sounds , 2009 .

[5]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[6]  Hong Liu,et al.  Sound Source Localization for HRI Using FOC-Based Time Difference Feature and Spatial Grid Matching , 2013, IEEE Transactions on Cybernetics.

[7]  Hong Kim,et al.  A name recognition based call-and-come service for home robots , 2008, IEEE Transactions on Consumer Electronics.

[8]  Hyunsoo Kim,et al.  Sound source localization for robot auditory systems , 2009, IEEE Transactions on Consumer Electronics.

[9]  Yunmo Chung,et al.  Integrated system of face recognition and sound localization for a smart door phone , 2013, IEEE Transactions on Consumer Electronics.

[10]  D. Mitchell Wilkes,et al.  An application of passive human-robot interaction: human tracking based on attention distraction , 2002, IEEE Trans. Syst. Man Cybern. Part A.

[11]  Jeong-Sik Park,et al.  Acoustic interference cancellation for a voice-driven interface in smart TVs , 2013, IEEE Transactions on Consumer Electronics.

[12]  Parham Aarabi,et al.  Enhanced sound localization , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[14]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[15]  Dongsuk Yook,et al.  Automatic sound recognition for the hearing impaired , 2008, IEEE Transactions on Consumer Electronics.

[16]  O. L. Frost,et al.  An algorithm for linearly constrained adaptive array processing , 1972 .

[17]  Sung-Suk Kim,et al.  Sound source localization with the aid of excitation source information in home robot environments , 2008, IEEE Transactions on Consumer Electronics.

[18]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[19]  Joseph H. DiBiase A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays , 2000 .