Is it possible to know how many speakers are speaking simultaneously in case of speech overlap? If the human brain, creation not yet mastered, manages to do it and even to understand the mixed speech meaning, it is not yet the case for the existing automatic systems. For this task, we propose a new method able to estimate the number of speakers in a mixture of speech signals. The algorithm developed here is based on the computation of the statistical characteristic of the 7th Mel coefficient extracted by spectral analysis from the speech signal. This algorithm using a confidence parameter, which we called PENS, is tested on seven different sets of the ORATOR database, which contain seven multi-speaker files each. Results show that PENS parameter permits us to make a good discrimination, without any ambiguity, between a mono-speaker signal (only one speaker is speaking) and a mixed-speakers signal (several speakers are speaking simultaneously). Moreover, it permits us to estimate, in case of mixed speech signals, the number of speakers with a good precision, especially when the number of speakers is less than four.
[1]
Holger Quast,et al.
Automatic Recognition of Nonverbal Speech An Approach to Model the Perception of Para-and Extralinguistic Vocal Communication with Neural Networks
,
2001
.
[2]
Futoshi Asano,et al.
Detection and Separation of Speech Events in Meeting Recordings Using a Microphone Array
,
2007,
EURASIP J. Audio Speech Music. Process..
[3]
H. S. Lee,et al.
Application of multi-layer perceptron in estimating speech/noise characteristics for speech recognition in noisy environment
,
1995,
Speech Commun..
[4]
Takayuki Arai,et al.
Estimating number of speakers by the modulation characteristics of speech
,
2003,
2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..