Improving Instrumental Sound Synthesis by Modeling the Effects of Performer Gesture

This paper deals with the effects of gesture on the sound signal produced by an acoustic instrument and the modeling of these gestures. We focus on the effects of ancillary gestures, those not primarily intended to produce sound, but nevertheless omnipresent in high-level instrumentalists' technique. We show that even in the case of non-expressive performance of wind instruments, performers' ancillary gestures appear in the observed sound by means of strong partial amplitude's modulation. We claim that these modulations account for a naturalness that is usually lacking in current synthesis applications. We have analyzed video recordings of clarinet performers in concert situations and identified typical gesture patterns. In order to demonstrate that these gestures are the main cause of the sound partial's modulation, we have undertaken an extensive set of clarinet, oboe and saxophone recordings in different reproducible environments, such as an anechoic chamber and an variable acoustics auditorium. The recordings were performed using multiple microphones at calibrated positions, always including one internal microphone. The notes were also interpreted in three styles: expressive (quasi-jazz), non-expressive performance and performance with the instrument kept completely immobilized by a mechanical apparatus. Simultaneously, experiments were done in order to determine the exact influence of the instrument's radiation directivity, of the room reverberation and of the instrument's mouthpiece. We studied more precisely the effects of the first reflection on the recorded sound when the instrument is moved. This was achieved by comparing recordings performed in an anechoic chamber using a removable wood floor placed underneath the instrument and the microphone. Finally, we measured the auditorium response excited by a loudspeaker connected to a clarinet tube. The tube was then rotated in order to allow a precise setting of its position according to standard performer gestures found in the analyzed videos. The sinusoidal partials of the sounds were then extracted using an additive analysis procedure. Partials may exhibit amplitude modulations of more than 10 dB, even for low frequency partials and in the case of non-expressive sound recordings. We show through measurements and theoretical demonstrations that these modulations are primarily caused by the influence of the performer's ancillary gestures coupled with the room's acoustical characteristics. Furthermore, we show that even in highly reverberant rooms, modulations do not occur in the absence of ancillary gesture. Mouthpiece and directional characteristics are demonstrated to have influence only in large amplitude movements, such as in deliberated movements. Finally, a real-time synthesis model of a clarinet confirms these results, where the instrument's radiation pattern and an early room response model based on measurements of the variable acoustics auditorium have been implemented. Sound and video examples of performances using this model will be presented during the conference.