Evaluation of Joint Position-Pitch Estimation Algorithm for Localising Multiple Speakers in Adverse Acoustical Environments

Automatic speaker localisation, detection and tracking are important challenges in multi-channel hands-free communication systems. In particular, simultaneous localisation of different speakers is of great interest for multi-microphone noise reduction schemes. Besides position, another possible feature to distinguish between different speakers is the fundamental frequency (pitch) of the speakers’ voices. The recently proposed PositionPitch (PoPi) estimation algorithm combines speaker localisation based on well-known cross-correlation approaches with pitch estimation techniques. In this contribution we evaluate the robustness of a modified version of the PoPi algorithm for localising simultaneous speakers in a realistic environment including room reverberation and different signal-to-noise ratios (SNR). In order to improve robustness, we particularly focus on modifications of the frequency-domain phase transformation T {·} used by the original PoPi algorithm. Joint position-pitch estimation Methods for speaker localisation as in [5] use a two step approach to combine the estimate of pitch f0 and localisation, where in a first stage a pitch estimation algorithm is applied and in an second stage the direction of arrival (DoA) φ0 is determined. The approach used here automatically estimates pitch and position in one step using the so-called Popi plane ρ(φ, f0), i.e.,