Estimation of Talker's Head Orientation Based on Discrimination of the Shape of Cross-power Spectrum Phase Coefficients

This paper presents a talker’s head orientation estimation method using 2-channel microphones. In recent research, some approaches based on a network of microphone arrays have been proposed in order to estimate the talker’s head orientation. In those methods, the talker’s head orientation is estimated using the sound amplitude or peak value of CSP (Cross-power Spectrum Phase) coefficients obtained from each microphone array. However, microphone array network systems need many microphone arrays to be set along the walls of a given room so that sub-microphone arrays surround the user. In this paper, we focus on the shape of the CSP coefficients affected by the reverberation, which depends on the talker’s position and the head orientation. In our proposed method, we use not only the peak value but also the other values of the CSP coefficients as feature vectors, and the talker’s position and the head orientation are estimated by discriminating the CSP vector. The effectiveness of this method has been confirmed by talker localization and head orientation estimation experiments performed in a real environment.

[1]  Alessio Brutti,et al.  Oriented global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays , 2005, INTERSPEECH.

[2]  Tetsuya Takiguchi,et al.  Single-Channel Head Orientation Estimation Based on Discrimination of Acoustic Transfer Function , 2011, INTERSPEECH.

[3]  Tetsuya Takiguchi,et al.  System request detection in conversation based on acoustic and speaker alternation features , 2007, INTERSPEECH.

[4]  S. Araki,et al.  A DOA Based Speaker Diarization System for Real Meetings , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[5]  Tetsuya Takiguchi,et al.  HMM-based separation of acoustic transfer function for single-channel sound source localization , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Masahito Togami,et al.  Head orientation estimation of a speaker by utilizing kurtosis of a DOA histogram with restoration of distance effect , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Harvey F. Silverman,et al.  A baseline algorithm for estimating talker orientation using acoustical data from a large-aperture microphone array , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Xavier Anguera Miró,et al.  Acoustic Beamforming for Speaker Diarization of Meetings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Climent Nadeu,et al.  Speaker orientation estimation based on hybridation of GCC-PHAT and HLBR , 2008, INTERSPEECH.

[10]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .