Multirate STC and Its Application to Multi-Speaker Conferencing

The problem of conferencing over systems which employ parametric vocoders has long been of interest to the military. In analog or wideband digital conferencing, overlapping speakers are handled by signal summation at a conferencing bridge. Such a scheme is not feasible for parametric vocoders which would require synthesis and reanalysis of the aggregate speech signal, a process called tandeming, which results in severe loss in quality in the synthetic speech. Moreover, further degradations occur when multiple speakers are active since parametric vocoders are not designed to model more than one voice. One narrowband technique currently in use is based on the idea of signal selection—a speaker has the channel until finished or until replaced by someone with a higher priority, and speakers contend for the open channel when it becomes available [1]. The advantage of such a technique is that it avoids the degradations due to tandeming, but it is cumbersome. A more natural conference control is handled by interruptions corresponding to multiple speakers producing overlapping speech. One scheme that permits two-speaker overlaps assigns one-half of the available bandwidth to each speech coder and defers signal summation to the terminal [2]. This approach limits the overall quality of the conference by forcing the coder to work at half the bandwidth. Since for the majority of a conference there will be only a single active speaker, this technique causes an overall degradation in the perceived quality in order to model an event that occurs relatively infrequently.

[1]  Thomas F. Quatieri,et al.  Pitch estimation and voicing detection based on a sinusoidal speech model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  R. J. McAulay,et al.  Computationally efficient sine-wave synthesis and its application to sinusoidal transform coding , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  R. J. McAulay,et al.  The sinusoidal transform coder at 2400 b/s , 1992, MILCOM 92 Conference Record.

[4]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[5]  Thomas F. Quatieri,et al.  Sine-wave phase coding at low data rates , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.