Fast recognition of musical sounds based on timbre.

Human listeners seem to have an impressive ability to recognize a wide variety of natural sounds. However, there is surprisingly little quantitative evidence to characterize this fundamental ability. Here the speed and accuracy of musical-sound recognition were measured psychophysically with a rich but acoustically balanced stimulus set. The set comprised recordings of notes from musical instruments and sung vowels. In a first experiment, reaction times were collected for three target categories: voice, percussion, and strings. In a go/no-go task, listeners reacted as quickly as possible to members of a target category while withholding responses to distractors (a diverse set of musical instruments). Results showed near-perfect accuracy and fast reaction times, particularly for voices. In a second experiment, voices were recognized among strings and vice-versa. Again, reaction times to voices were faster. In a third experiment, auditory chimeras were created to retain only spectral or temporal features of the voice. Chimeras were recognized accurately, but not as quickly as natural voices. Altogether, the data suggest rapid and accurate neural mechanisms for musical-sound recognition based on selectivity to complex spectro-temporal signatures of sound sources.

[1]  Arnaud Delorme,et al.  Spike-based strategies for rapid processing , 2001, Neural Networks.

[2]  R. Patterson,et al.  Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. , 1995, The Journal of the Acoustical Society of America.

[3]  Neil A. Macmillan,et al.  Detection theory: A user's guide, 2nd ed. , 2005 .

[4]  Neil A. Macmillan,et al.  Detection Theory: A User's Guide , 1991 .

[5]  Hideki Kawahara,et al.  STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds , 2006 .

[6]  J. Fritz,et al.  Dynamics of Precise Spike Timing in Primary Auditory Cortex , 2004, The Journal of Neuroscience.

[7]  Pascal Belin,et al.  Electrophysiological evidence for an early processing of human voices , 2009, BMC Neuroscience.

[8]  F. Donders On the speed of mental processes. , 1969, Acta psychologica.

[9]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[10]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[11]  Christian Lorenzi,et al.  The ability of listeners to use recovered envelope cues from speech fine structure. , 2006, The Journal of the Acoustical Society of America.

[12]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[13]  M. Besson,et al.  Musical training influences linguistic abilities in 8-year-old children: more evidence for brain plasticity. , 2009, Cerebral cortex.

[14]  S A Shamma,et al.  Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.

[15]  Israel Nelken,et al.  Responses of auditory cortex to complex stimuli: functional organization revealed using intrinsic optical signals. , 2008, Journal of neurophysiology.

[16]  Simon J. Thorpe,et al.  Ultra-rapid object detection with saccadic eye movements: Visual processing speed revisited , 2006, Vision Research.

[17]  J. Grey Multidimensional perceptual scaling of musical timbres. , 1977, The Journal of the Acoustical Society of America.

[18]  Ernst Mach,et al.  Sensations of tone. , 1897 .

[19]  Stephen McAdams,et al.  Caractérisation du timbre des sons complexes.II. Analyses acoustiques et quantification psychophysique , 1994 .

[20]  S. Thorpe,et al.  Spike times make sense , 2005, Trends in Neurosciences.

[21]  S. Thorpe,et al.  Speed of processing in the human visual system , 1996, Nature.

[22]  R. Duncan Luce,et al.  Response Times: Their Role in Inferring Elementary Mental Organization , 1986 .

[23]  Shlomo Bentin,et al.  Neural sensitivity to human voices: ERP evidence of task and attentional influences. , 2003, Psychophysiology.

[24]  J Gautrais,et al.  Rate coding versus temporal order coding: a theoretical approach. , 1998, Bio Systems.

[25]  Roy D. Patterson,et al.  Locating the initial stages of speech–sound processing in human temporal cortex , 2006, NeuroImage.

[26]  S. Bentin,et al.  Processing specificity for human voice stimuli: electrophysiological evidence , 2001, Neuroreport.

[27]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[28]  D. Irvine,et al.  Functional specialization in auditory cortex: responses to frequency-modulated stimuli in the cat's posterior auditory field. , 1998, Journal of neurophysiology.

[29]  Daniel Pressnitzer,et al.  Rapid Formation of Robust Auditory Memories: Insights from Noise , 2010, Neuron.

[30]  S. S. Stevens Frequency Analysis and Periodicity Detection in Hearing. , 1972 .

[31]  D. Pisoni,et al.  Speech perception without traditional speech cues. , 1981, Science.

[32]  A specialization for speech perception revised , 1985 .

[33]  Brian Gygi,et al.  Similarity and categorization of environmental sounds , 2007, Perception & psychophysics.

[34]  G. Soete,et al.  Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes , 1995, Psychological research.

[35]  J. W. Gordon,et al.  Perceptual effects of spectral modifications on musical timbres , 1978 .

[36]  Jonathan Z. Simon,et al.  Robust Spectrotemporal Reverse Correlation for the Auditory System: Optimizing Stimulus Design , 2000, Journal of Computational Neuroscience.

[37]  Erich Schröger,et al.  Is My Mobile Ringing? Evidence for Rapid Processing of a Personally Significant Sound in Humans , 2010, The Journal of Neuroscience.

[38]  R Van Rullen,et al.  Face processing using one spike per neurone. , 1998, Bio Systems.

[39]  Roy D. Patterson,et al.  The sound of a sinusoid: Time‐interval models , 1994 .

[40]  S. Hochstein,et al.  View from the Top Hierarchies and Reverse Hierarchies in the Visual System , 2002, Neuron.

[41]  Anthony M. Zador,et al.  Millisecond-scale differences in neural activity in auditory cortex can drive decisions , 2008 .

[42]  Patrick Susini,et al.  Why are natural sounds detected faster than pips? , 2010, The Journal of the Acoustical Society of America.

[43]  Alan C. Evans,et al.  Musical Training Shapes Structural Brain Development , 2009, The Journal of Neuroscience.

[44]  Brian C. J. Moore,et al.  Temporal integration and context effects in hearing , 2003, J. Phonetics.

[45]  Anne-Lise Giraud,et al.  Distinct functional substrates along the right superior temporal sulcus for the processing of voices , 2004, NeuroImage.

[46]  Guillaume A. Rousselet,et al.  Parallel processing in high-level categorization of natural images , 2002, Nature Neuroscience.

[47]  R. Zatorre,et al.  Voice-selective areas in human auditory cortex , 2000, Nature.

[48]  R VanRullen,et al.  Is it a Bird? Is it a Plane? Ultra-Rapid Visual Categorisation of Natural and Artifactual Objects , 2001, Perception.

[49]  Kerry M. M. Walker,et al.  Multiplexed and Robust Representations of Sound Features in Auditory Cortex , 2011, The Journal of Neuroscience.

[50]  N. Kraus,et al.  Music training for the development of auditory skills , 2010, Nature Reviews Neuroscience.

[51]  Stephen McAdams,et al.  A Meta-analysis of Timbre Perception Using Nonlinear Extensions to CLASCAL , 2008, CMMR.

[52]  S. Hochstein,et al.  Reverse hierarchies and sensory learning , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[53]  Sophie Donnadieu,et al.  Mental Representation of the Timbre of Complex Sounds , 2007 .

[54]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[55]  J. Ballas Common factors in the identification of an assortment of brief everyday sounds , 1993 .