A Sound Modeling and Synthesis System Designed for Maximum Usability

A synthesis and sound-modeling system is introduced. The design philosophy is to be “good enough most of the time” in an extremely wide variety of real-world scenarios at the possible expense of being the best in any one particular aspect including speed and reliability. Sound design and synthesis system design goals and decisions are discussed, and we consider the appropriateness of using pure Java as an implementation language for such systems. 1 Design Constraints There are a plethora of software synthesizers and sound development environments to choose between today, and each has its own strengths and weaknesses in meeting the many different design goals for a sound design and synthesis (SDS) system. Often the design goals support conflicting decisions in implementation. For example, the goal of expressivity may support the invention of a new language oriented specifically toward sonic and musical tasks, while the goal of learnability may be better addressed by using a language already familiar to many people. We have developed an SDS system, called Asound, with maximum usability being the primary design criteria, written in pure Java. Common design objectives for SDS systems revolve around the following issues: • Speed – for real-time performance and shortest delay between input audio or control signals and audio output. • Reliability – dependable delivery of uninterrupted audio to the output device. • Expressivity– the ability for the code to written in the musical and/or sonic terms in which composers and sound designers think. • Power – the range of available tools (e.g. a large number of unit generators, or sound producing and filtering routines) • Learnability – the system needs to be as usable as possible by musicians and sound designers, even if they are not expert programmers. This is part of the motivation for graphical interfaces such as MAX (Puckette, 1991). • Fast development times – The time it takes to develop bug free sound models should be minimized. • Development support – a good integrated development and debugging environment. • Usability in education – a combination of expressivity and learnability. • Ubiquity – the system should be inexpensive and not require special purpose hardware and software. • Support for complexity – the ability to write richly structured sound models in readable code and to “hide” complexity in functions and objects. • Absence of musical and sonic structure biases – the system should bias the user as little as possible as to the genre of music or sonic style. For example, it should be possible to integrate algorithms for event and sound generation. • Extensibilty – Sound developers and users need to be able to extend the capabilities of an SDS system since no particular one will ever meet all the needs of designers and composers. Furthermore, sound modeling is an active field, and an SDS system needs to be designed to grow as new needs and possibilities arise. • Cross-platform potential It should be possible to develop sound and musical objects on which ever platform is most convenient for the composer or developer, and run them on most others. • Integrability – Sound objects should be runnable from a maximum number of other applications environments – code written in languages other than the one used to develop the sound object, MIDI controllers, and should enable control from and if necessary, return audio to other applications such as sequencers, audio editors, graphical applications, games, multimedia development environments such as Macromedia Director & Flash, 3DS Max. • Maintainabilty – easy to upgrade and modify with the minimum amount of effort. Argues against graphical interfaces. • Small bandwidth requirements – a concern when downloading the system and/or sound models is required for applets, interactive Macromedia applications, and online or downloadable games. • Low security risk client computers must be safe from the possibility of downloading malicious code 2 Meeting Design Requirements with Java By building the SDS system in pure Java, many of the above requirements are automatically met. The core of the system that address the sound design requirements per se are addressed below. Addressing ubiquity and learnability, Java is a free, commonly used and widely taught language. For the many thousands of programmers, only the specific library of classes for sound need be learned. For development support, there are commercially supported online manuals, and extensive tutorial material available across the Web. There are free and commercial IDE’s (integrated development environments) that support object viewing, graphical interface design, and state-of-the-art debugging tools. It is considerably faster to develop code in Java than in C or C++, something we now have the luxury to consider with the execution speeds that modern desktop computers are achieving. Having the full power of a general-purpose language is important for several reasons. It helps circumvent biases built in to the sound and musical construction process that are hard or impossible to work around in more constrained task-specific languages. It permits elegant coding style and the possibility of developing with manageable complexity. Both the core ASound system and Sound Models run without modification on any platform with a modern Java VM. By creating a core set of sound and musical classes for programming, and not providing a graphical coding environment, the system is both easy to use and easy to maintain. Design pressures on SDS systems that have been growing in importance have to do with the growing need to send applications, plug-ins and/or sound models over the network. The core ASound system is under 60 Kb, and sound models that don’t require audio file resources are typically from 2 to 5 Kb. These are manageable numbers for even the most limited memory devices, and make download time negligible. A Java-based system also poses no security threat to clients. ASound sound models are executable Java byte code (not simply parameters for predefined synthesizers). For a core engine it is at least feasible to require end users to grant a one-time security certificate, but many different sound models are used in typical applications and can they come from many different developers and vendors. This would create an insurmountable security problem for sound models written in C, for example. 3 Meeting Design Constraints with Modular System Design The central design unit of the system is the Sound Model. A standard interface affords Play(), Stop() and Release() events, and continuous parameter access (setting and getting) in both natural units, and in normalized floating point [0,1] units. Rather than have sound model-specific and parameter-specific methods for control, the interface methods for control use a parameter index retrieved from the sound using the parameter string name so that the interface methods are the same for all sound models. Sound Models return audio with a call to a Generate() method that takes an empty buffer and a requested number of samples to fill it with as arguments. A separate object, the SoundManager, controls a back-end output engine with an audio buffer and a timer. The SoundManager manages a list of sound models by periodically calling their Generate() methods, and summing the results into the audio output buffer. This backend is entirely separate from the sound model and need not be used at all. An application can manage the Generate() calls itself. This situation arises, for example, if an application has access to a machine-specific buffer architecture (e.g. Creative EAX buffers on Wintel environments). Another example of not using the SoundManager synthesis backend is a non-realtime application for creating audio files from a musical score. Such an application calls sound model Generate() methods at whatever (possibly irregular) intervals are appropriate for the time stamps of the events in the score and concatenates the returned audio to the end of a file. Input control is similarly separate from sound models. For example, sound models never contain MIDI specific (or even more confounding, graphical interface) code. An entirely separate MIDI synthesizer application manages MIDI input and mappings to a list of sound models, and is used as a recipient for messages streaming from a commercial midi sequencer application. By separating the backend timer/buffer engine and the front-end control systems from the sound models, the sound models are clean, small, and useful in a maximum variety of contexts. 4 Sound Model Design A library of classes for standard structures and unit generators (e.g. oscillators, filters) is used to build a sound model. A key feature of the system is that event generation and audio generation are supported on an equal footing. The Sound Model Generate() method basically calls two methods in sequence; GenerateEvents() and GenerateAudio(). Both perform their computation up to a specified time corresponding to the length of the buffer fill requested from Generate(). The event generator uses the standard model interface (parameter changes, starts, stops, releases as described above) with an additional time stamp to send events to a “submodel” which puts them on a queue that is managed with sample accuracy. If the model uses event generation for a submodel, then its audio generator typically takes responsibility for getting the audio from the submodel by calling the submodel Generate() method. The structure is shown in Figure 1. 4 Common Sound Model Structures Single-event Audio Generation – this is the standard parameterized synthesizer paradigm that generates a single “event” in response to a “play” command and offers realtime parametric control. When the Generate method on such a model is called, the Event Generator does nothing, and the Audio Generator return the audio. The Audio Generator synthesis algorithm may have many dimensi