Estimating Key Speaker in Meeting Speech Based on Multiple Features Optimization

This paper proposes to estimate key speaker in meeting speech based on multiple features optimization. First, each feature is defined and their differences between key speaker and other speakers are analyzed. Then, a decision function of multiple feature weighting is generated for estimating key speaker in meeting speech, and the genetic algorithm is used to optimize these coefficients of feature weighting. The methods are evaluated on three different meeting speech datasets. Experimental results show that the proposed optimization method obtains average accuracy of 93.3% for estimating key speaker, and gains average accuracy improvement by 9.7% and 4.1% compared with the previous method and the feature weighting method without optimization, respectively.

[1]  James R. Glass,et al.  Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Lin-Shan Lee,et al.  Interactive spoken content retrieval by extended query model and continuous state space Markov Decision Process , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Ching-Tsorng Tsai,et al.  EVOLVING A TEAM IN A FIRST-PERSON SHOOTER GAME BY USING A GENETIC ALGORITHM , 2013, Appl. Artif. Intell..

[4]  Rasit Köker,et al.  A genetic algorithm approach to a neural-network-based inverse kinematics solution of robotic manipulators based on error minimization , 2013, Inf. Sci..

[5]  Yukiko Nakano,et al.  Estimating conversational dominance in multiparty interaction , 2012, ICMI '12.

[6]  Yanxiong Li,et al.  Feature Mean Distance Based Speaker Clustering for Short Speech Segments: Feature Mean Distance Based Speaker Clustering for Short Speech Segments , 2012 .

[7]  Lukás Burget,et al.  Transcribing Meetings With the AMIDA Systems , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Gerald Friedland,et al.  Estimating Dominance in Multi-Party Meetings Using Speaker Diarization , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Pan Peng,et al.  Recognize the most dominant person in multi-party meetings using nontraditional features , 2010, 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[10]  Tao Li,et al.  Characteristics-based effective applause detection for meeting speech , 2009, Signal Process..

[11]  Chuohao Yeo,et al.  Modeling Dominance in Group Conversations Using Nonverbal Activity Cues , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Gerald Friedland,et al.  Estimating the dominant person in multi-party conversations using speaker diarization strategies , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Jean-Marc Odobez,et al.  Using audio and video features to classify the most dominant person in a group meeting , 2007, ACM Multimedia.

[14]  Daniel Gatica-Perez,et al.  Detection and application of influence rankings in small group meetings , 2006, ICMI '06.

[15]  Dirk Heylen,et al.  Dominance Detection in Meetings Using Easily Obtainable Features , 2005, MLMI.