Computing auditory perception

In this paper the ingredients of computing auditory perception are reviewed. On the basic level there is neurophysiology, which is abstracted to artificial neural nets (ANNs) and enhanced by statistics to machine learning. There are high-level cognitive models derived from psychoacoustics (especially Gestalt principles). The gap between neuroscience and psychoacoustics has to be filled by numerics, statistics and heuristics. Computerised auditory models have a broad and diverse range of applications: hearing aids and implants, compression in audio codices, automated music analysis, music composition, interactive music installations, and information retrieval from large databases of music samples.

[1]  Richard F. Lyon,et al.  Auditory model inversion for sound separation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Ian Whalley,et al.  Emotion, Theme And Structure: Enhancing Computer Music Through System Dynamics Modelling , 2000, ICMC.

[3]  O. S. Marin,et al.  Neurological Aspects of Music Perception and Performance , 1999 .

[4]  R. Shepard Geometrical approximations to the structure of musical pitch. , 1982, Psychological review.

[5]  G. Soete,et al.  Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes , 1995, Psychological research.

[6]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[7]  Tomohiro Nakatani,et al.  Residue-Driven Architecture for Computational Auditory Scene Analysis , 1995, IJCAI.

[8]  S. Lakatos A common perceptual space for harmonic and percussive timbres , 2000, Perception & psychophysics.

[9]  W Singer,et al.  Role of the temporal domain for response selection and perceptual binding. , 1997, Cerebral cortex.

[10]  Klaus Obermayer,et al.  A new method for tracking modulations in tonal music in audio data format , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[11]  Cornelius Weber,et al.  Maximum a posteriori models for cortical modeling: feature detectors, topography and modularity , 2008 .

[12]  Jonathan Berger,et al.  A Neural Network Model of Metric Perception and Cognition in the Audition of Functional Tonal Music , 1997, ICMC.

[13]  David Cope,et al.  Experiments In Musical Intelligence , 1996 .

[14]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[15]  Gary L. Dannenbring,et al.  The effect of continuity on auditory stream segregation , 1973 .

[16]  Ian Whalley,et al.  Applications of system dynamics modelling to computer music , 2000, Organised Sound.

[17]  E. Terhardt,et al.  Algorithm for extraction of pitch and pitch salience from complex tonal signals , 1982 .

[18]  R. Benjamin Knapp,et al.  A Bioelectric Controller for Computer Music Applications , 1990 .

[19]  M. P. Friedman,et al.  ACADEMIC PRESS SERIES IN COGNITION AND PERCEPTION , 1982 .

[20]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[21]  Guy J. Brown,et al.  Temporal synchronization in a neural oscillator model of primitive auditory stream segregation , 1998 .

[22]  Johannes Feulner,et al.  Neural Networks that Learn and Reproduce Various Styles of Harmonization , 1993, ICMC.

[23]  Gerald Langner,et al.  Coding of temporal patterns in the central auditory nervous system , 1988 .

[24]  Louis Dunn Fielder,et al.  AC-2 and AC-3: Low-Complexity Transform-Based Audio Coding , 1996 .

[25]  Y. Meyer,et al.  Wavelets and Filter Banks , 1991 .

[26]  Wofgang Maas,et al.  Networks of spiking neurons: the third generation of neural network models , 1997 .

[27]  Ralph Linsker,et al.  How to Generate Ordered Maps by Maximizing the Mutual Information between Input and Output Signals , 1989, Neural Computation.

[28]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[29]  Tomohiro Nakatani,et al.  Combining Independent Component Analysis and Sound Stream Segregation , 1999 .

[30]  Matthew I. Bellgard,et al.  Harmonizing Music the Boltzmann Way , 1994, Connect. Sci..

[31]  Özgür Izmirli,et al.  A model for tonal context time course calculation from acoustical input , 1996 .

[32]  Gerald B. Folland,et al.  Other References , 1965, Comparative Education Review.

[33]  Antonio Camurri,et al.  Synthesis of expressive movement , 2000, ICMC.

[34]  Deliang Wang,et al.  Global competition and local cooperation in a network of neural oscillators , 1995 .

[35]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[36]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[37]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1952, The Journal of physiology.

[38]  C. Krumhansl,et al.  Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. , 1982, Psychological review.

[39]  C. J. McGrath,et al.  Effect of exchange rate return on volatility spill-over across trading regions , 2012 .

[40]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[41]  Judith C. Brown,et al.  An efficient algorithm for the calculation of a constant Q transform , 1992 .

[42]  David Rosenboom,et al.  The Performing Brain , 1990 .

[43]  Kunio Kashino,et al.  Application of the Bayesian probability network to music scene analysis , 1998 .

[44]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..

[45]  DeLiang Wang,et al.  Primitive Auditory Segregation Based on Oscillatory Correlation , 1996, Cogn. Sci..

[46]  Michael A. Casey,et al.  Separation of Mixed Audio Sources By Independent Subspace Analysis , 2000, ICMC.

[47]  Carol Krumhansl,et al.  Psychophysiology of Musical Emotions , 1997, ICMC.

[48]  Michael C. Mozer,et al.  Neural Network Music Composition by Prediction: Exploring the Benefits of Psychoacoustic Constraints and Multi-scale Processing , 1994, Connect. Sci..

[49]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[50]  K. Obermayer,et al.  PHASE TRANSITIONS IN STOCHASTIC SELF-ORGANIZING MAPS , 1997 .

[51]  Annette H. Zalanowski On twentieth-century music , 1973 .

[52]  P. Philips,et al.  JADETD : COMBINING HIGHER-ORDER STATISTICS AND TEMPORALINFORMATION FOR BLIND SOURCE SEPARATION ( WITH NOISE ) , 1999 .

[53]  Marc Leman,et al.  Music and Schema Theory , 1995 .

[54]  Eric D. Scheirer,et al.  Towards music understanding without separation: segmenting music with correlogram comodulation , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[55]  Neil P. McAngus Todd,et al.  Implications of a sensory‐motor theory for the representation and segregation of speech , 1999 .

[56]  A. J. Bell,et al.  A Unifying Information-Theoretic Framework for Independent Component Analysis , 2000 .

[57]  T. Ens,et al.  Blind signal separation : statistical principles , 1998 .

[58]  Guy J. Brown,et al.  Interactive explorations in speech and hearing , 1999 .

[59]  E. Mach Beiträge zur Analyse der Empfindungen , 1886 .

[60]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[61]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[62]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[63]  Andreas Ziehe,et al.  An approach to blind source separation based on temporal structure of speech signals , 2001, Neurocomputing.

[64]  Wolfgang Maass,et al.  Networks of Spiking Neurons: The Third Generation of Neural Network Models , 1996, Electron. Colloquium Comput. Complex..

[65]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[66]  Ray Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .