Real-Time auditory Models

The peripheral auditory system is a complex ensemble of mechanical and neural structures that have a profound influence on how we perceive sounds. As more became known about the physiology of these structures, computational models emerged to simulate their functions. Auditory models are now widely used in psychoacoustics to predict phenomena such as masking, loudness, pitch, roughness, etc. A variety of implementations exist, each tuned to a specific need. In this paper, we argue that a real-time implementation of simplified auditory models can open up a new range of usages, both for music (estimation of perceptual attributes, visualisation, analysisresynthesis) and research (simulation of hearing impairments). We have developed an implementation of such models in Pure Data, starting with an auditory filterbank. 1. A BRIEF OVERVIEW OF THE PERIPHERAL AUDITORY SYSTEM 1.1. Anatomy and physiology A sound is a pressure variation that reaches our ears. It has to be converted into neural activity in the brain before we can perceive it. The outer, middle and inner ear perform successive stages of this initial transduction, which shapes the features available to perception. The outer ear and middle ear convert the aerial vibrations into mechanical ones, which are then communicated to the fluids of the cochlea. Running along the cochlea is the basilar membrane, an elastic structure where travelling waves are provoked by the mechanical vibrations. The travelling waves have an envelope that depends on the frequency content of sounds: the displacement is maximal near the base of the membrane for high-frequency tones, and near the apex for low-frequency tones. Lying on top of the basilar membrane is the organ of Corti. This organ contains inner and outer hair-cell. When the basilar membrane is set into motion, the hairs of haircells are deflected and there is an increased probability of neural discharge in the corresponding auditory nerve fibres. Inner hair cells mostly transmit information to the brain, whereas outer hair cells mostly receive information from the brain. 1.2. Tonotopic and temporal coding The cochlea thus performs a dual coding of the features of incoming sounds. The basilar membrane encodes mechanically the frequency content of sounds with its profile of displacement. This first code has been termed tonotopy, as it codes tone with place. The second code is a temporal one. The probability of discharge in the auditory nerve follows the phase of displacements on the basilar membrane. The precise temporal structure of the displacement at a particular place is thus encoded in the discharge patterns of corresponding nerve fibres. This code is called phase locking. 1.3. Non-linear processes There are many sources of non-linearities in the cochlea. Outer hair-cells, for instance, are thought to participate in an active feedback loop modifying the local properties of the encoding. Such non-linearities play a crucial role in the exquisite sensitivity and selectivity of the normalhearing auditory system. 2. AUDITORY FILTERBANKS 2.1. Gammatone filters There is a whole field of research devoted to cochlear modelling. The aim can be to better understand the structure of the cochlea described above, by accurate mechanical modelling, or to reproduce its signal processing characteristics on a functional level. Within this functional approach, an important tool has emerged in the form of the auditory filterbank. Tonotopic and temporal coding in the cochlea provide the basis for some sort of time-frequency analysis by the auditory system. An auditory filterbank, as the name indicates, is a set of filters that try to reproduce the particularities of this time-frequency analysis. The precise shape and parameters of the filters can vary from model to model, but they are all fitted to perceptual or physiological measures. We will here consider the “gammatone” auditory filterbank. The shape of gammatone filters were chosen to fit perceptual masking experiments [1]. The hypothesis behind the model was that auditory filters can be viewed as somewhat independent processing channels, each with a given centre frequency and selectivity. In the case of Auditory Filterbank Sound stream Non-linear processing