Joint Position-Pitch Tracking for 2-Channel Audio

In this paper, a new representation for acoustic source indexing in a multi-source environment is introduced. Each source is represented by a two-dimensional Gaussian-like probability distribution as a function of pitch and direction-of-arrival (DoA). These features of source candidates form the time dependent Position-Pitch (PoPi) plane, which is extracted from 2-channel audio. For demonstration, the time-evolution of Gaussians corresponding to source candidates are tracked by a Viterbi decoder. The Viterbi tracking is extended to multiple paths and pruning of similar paths using the normalized Levenshtein distance is applied.

[1]  Olaf Schreiner,et al.  Robust pitch tracking in the car environment , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Maurizio Omologo,et al.  Talker localization and speech recognition using a microphone array and a cross-powerspectrum phase analysis , 1994, ICSLP.

[3]  Enrique Vidal,et al.  Computation of Normalized Edit Distance and Applications , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.