Timbre and the Perceptual Effects of Three Types of Data Reduction

Research conducted with the help of the computer in the last several years has allowed us to synthesize tones whose timbres are perceptually indistinguishable from those of many musical instruments. This success has often given us important information about the relationships between the psychological components of hearing and the physical attributes of sound. For musical purposes, an analysis/synthesis model of a real tone can be considered perfect if, when using data resulting from the analysis, we find that a resynthesized tone is perceptually indistinguishable from the real sound. Such a system can be realized, for example, by heterodyne filtering (Moorer 1973) or, better yet, with the phase vocoder (Portnoff 1976; Moorer 1978). However, defining a tone completely in order to reproduce exactly a given timbre leads to a fourdimensional representation, with amplitude, frequency, time, and phase as physical parameters. Usually phase does not appear as a perceptibly significant parameter because in most cases it can be significantly perturbed-by a reverberant room, for example-without audible loss of information. Nevertheless, with only three physical dimensions, the amount of data necessary for an individual tone is still considerable since it consists of two timevarying functions (amplitude and frequency) for each partial. This is why additive synthesis, although capable of producing excellent results, is often considered by musicians to be not very useful, even if computations are done quickly by dedicated hardware, because the quantity of data is quite large even for a simple score. Recently, multidimensional scaling techniques (Shepard 1962a; 1962b; Kruskal 1964a; 1964b; Carroll and Chang 1970; Benz6cri et al. 1973a; 1973b) have provided unambiguous evidence that timbre is a multidimensional attribute of sound (Plomp 1970; Wessel 1973; 1978; Miller and Carterette 1975; Grey 1977). This could justify a priori the necessity for a large amount of data to define this attribute. The analyses of timbre just cited, however, indicate that the identification of timbre is related to few psychological factors. All the details of the time-varying functions (for amplitude and frequency) are probably not significant for the p rception of timbre. Parallel to this development, the success of nonlinear synthesis techniques illustrated by frequency modulation (FM) synthesis (Chowning 1973) leads us to think that important simplifications can be made in the definition of sound without significant deterioration in timbre. By controlling a simple modulation index, we can simulate the timbre of different musical instruments in a very satisfactory (though discernible) manner. Data reduction is thus not only of obvious interest for the synthesis, transformation, or transmission of sound. It also permits a deeper understanding of the truly relevant features of hearing (specifically the invariable elements of sound perception). We have experimented with data reduction in the three dimensions defining sound, namely amplitude, frequency, and time. In this article we will discuss three approaches: (1) data reduction for the time-varying amplitude functions of each partial, (2) data reduction for the time-varying frequency functions for each partial, and (3) prediction of the starting and ending times of each partial. We will discuss the results concerning discrimination of the various modifications of timbre for 16 isolated tones, which represent a wide range of the traditional nonpercussive orchestral instruments. Copyright ? 1981 by Gerard R. Charbonneau