Auditory gist: recognition of very short sounds from timbre cues.

Sounds such as the voice or musical instruments can be recognized on the basis of timbre alone. Here, sound recognition was investigated with severely reduced timbre cues. Short snippets of naturally recorded sounds were extracted from a large corpus. Listeners were asked to report a target category (e.g., sung voices) among other sounds (e.g., musical instruments). All sound categories covered the same pitch range, so the task had to be solved on timbre cues alone. The minimum duration for which performance was above chance was found to be short, on the order of a few milliseconds, with the best performance for voice targets. Performance was independent of pitch and was maintained when stimuli contained less than a full waveform cycle. Recognition was not generally better when the sound snippets were time-aligned with the sound onset compared to when they were extracted with a random starting time. Finally, performance did not depend on feedback or training, suggesting that the cues used by listeners in the artificial gating task were similar to those relevant for longer, more familiar sounds. The results show that timbre cues for sound recognition are available at a variety of time scales, including very short ones.

[1]  Michael J Newton,et al.  A neurally inspired musical instrument classification system based upon the sound onset. , 2012, The Journal of the Acoustical Society of America.

[2]  J. F. Corso,et al.  Timbre Cues and the Identification of Musical Instruments , 1962 .

[3]  R. Carlyon,et al.  The role of resolved and unresolved harmonics in pitch perception and frequency modulation discrimination. , 1994, The Journal of the Acoustical Society of America.

[4]  L. Wiegrebe,et al.  Searching for the time constant of neural pitch extraction. , 2001, The Journal of the Acoustical Society of America.

[5]  P. Belin Voice processing in human and non-human primates , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[6]  Anthony M Zador,et al.  Representations in auditory cortex , 2009, Current Opinion in Neurobiology.

[7]  Alan R. Palmer,et al.  Psychophysical and physiological assessment of the representation of high‐frequency spectral notches in the auditory nerve , 2006 .

[8]  S. Thorpe,et al.  Speed of processing in the human visual system , 1996, Nature.

[9]  O. Tosi,et al.  Vowel recognition threshold as a function of temporal segmentations. , 1970, Journal of speech and hearing research.

[10]  B. Moore,et al.  Temporal window shape as a function of frequency and level. , 1989, The Journal of the Acoustical Society of America.

[11]  Brian C. J. Moore,et al.  Temporal integration and context effects in hearing , 2003, J. Phonetics.

[12]  Clara Suied,et al.  Fast recognition of musical sounds based on timbre. , 2012, The Journal of the Acoustical Society of America.

[13]  Ana Alves-Pinto,et al.  Detection of high-frequency spectral notches as a function of level. , 2005, The Journal of the Acoustical Society of America.

[14]  J. Grey Multidimensional perceptual scaling of musical timbres. , 1977, The Journal of the Acoustical Society of America.

[15]  Arnaud Delorme,et al.  Spike-based strategies for rapid processing , 2001, Neural Networks.

[16]  Neil A. Macmillan,et al.  Detection Theory: A User's Guide , 1991 .

[17]  Clara Suied,et al.  Characteristics of human voice processing , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[18]  R. Patterson,et al.  The Duration Required to Identify the Instrument, the Octave, or the Pitch Chroma of a Musical Note , 1995 .

[19]  C. Krumhansl,et al.  Isolating the dynamic attributes of musical timbre. , 1993, The Journal of the Acoustical Society of America.

[20]  S. Handel Listening As Introduction to the Perception of Auditory Events , 1989 .

[21]  G. Soete,et al.  Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes , 1995, Psychological research.

[22]  Alan R. Palmer,et al.  Rate versus time representation of high-frequency spectral notches in the peripheral auditory system: A computational modeling study , 2008, Neurocomputing.

[23]  S. Shamma Speech processing in the auditory system. II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve. , 1985, The Journal of the Acoustical Society of America.

[24]  Erich Schröger,et al.  Is My Mobile Ringing? Evidence for Rapid Processing of a Personally Significant Sound in Humans , 2010, The Journal of Neuroscience.

[25]  S. Handel,et al.  Listening: An Introduction to the Perception of Auditory Events , 1993 .

[26]  William J. Talkington,et al.  Human Cortical Organization for Processing Vocalizations Indicates Representation of Harmonic Structure as a Signal Attribute , 2009, The Journal of Neuroscience.

[27]  Michael P. Beddoes,et al.  Discrimination of vowel sounds of very short duration , 1972 .

[28]  S McAdams,et al.  Identification of concurrent harmonic and inharmonic vowels: a test of the theory of harmonic cancellation and enhancement. , 1995, The Journal of the Acoustical Society of America.

[29]  Barbara Tillmann,et al.  Categorization of Extremely Brief Auditory Stimuli: Domain-Specific or Domain-General Processes? , 2011, PloS one.

[30]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[31]  Hermann von Helmholtz,et al.  On the Sensations of Tone , 1954 .

[32]  R. Patterson,et al.  Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. , 1995, The Journal of the Acoustical Society of America.

[33]  A. Silva,et al.  Psychophysical and physiological assessment of the representation of high-frequency spectral notches in the auditory nerve , 2007 .

[34]  Roy D. Patterson,et al.  The stimulus duration required to identify vowels, their octave, and their pitch chroma , 1995 .

[35]  Mounya Elhilali,et al.  Music in Our Ears: The Biological Bases of Musical Timbre Perception , 2012, PLoS Comput. Biol..

[36]  N. Viemeister,et al.  Temporal integration and multiple looks. , 1991, The Journal of the Acoustical Society of America.

[37]  Daniel Pressnitzer,et al.  Rapid Formation of Robust Auditory Memories: Insights from Noise , 2010, Neuron.

[38]  Giles Wilkeson Gray Phonemic microtomy: The minimum duration of perceptible speech sounds , 1942 .