Perception of acoustic scale and size in musical instrument sounds.

There is size information in natural sounds. For example, as humans grow in height, their vocal tracts increase in length, producing a predictable decrease in the formant frequencies of speech sounds. Recent studies have shown that listeners can make fine discriminations about which of two speakers has the longer vocal tract, supporting the view that the auditory system discriminates changes on the acoustic-scale dimension. Listeners can also recognize vowels scaled well beyond the range of vocal tracts normally experienced, indicating that perception is robust to changes in acoustic scale. This paper reports two perceptual experiments designed to extend research on acoustic scale and size perception to the domain of musical sounds: The first study shows that listeners can discriminate the scale of musical instrument sounds reliably, although not quite as well as for voices. The second experiment shows that listeners can recognize the family of an instrument sound which has been modified in pitch and scale beyond the range of normal experience. We conclude that processing of acoustic scale in music perception is very similar to processing of acoustic scale in speech perception.

[1]  Roy D Patterson,et al.  The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age. , 2005, The Journal of the Acoustical Society of America.

[2]  S M Abel,et al.  Duration discrimination of noise and tone bursts. , 1972, The Journal of the Acoustical Society of America.

[3]  W. Fitch Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. , 1997, The Journal of the Acoustical Society of America.

[4]  Terrance M. Nearey,et al.  Modeling the perception of frequency-shifted vowels , 2002, INTERSPEECH.

[5]  W. Fitch Acoustic exaggeration of size in birds via tracheal elongation: comparative and theoretical analyses , 1999 .

[6]  David H. Wolpert,et al.  The Existence of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[7]  R. Patterson,et al.  The lower limit of pitch as determined by rate discrimination. , 2000, The Journal of the Acoustical Society of America.

[8]  G. A. Miller,et al.  Sensitivity to Changes in the Intensity of White Noise and Its Relation to Masking and Loudness , 1947 .

[9]  RocchessoDavide,et al.  Auditory perception of 3D size , 2004 .

[10]  Hideki Kawahara,et al.  Underlying Principles of a High-quality Speech Manipulation System STRAIGHT and Its Application to Speech Segregation , 2005, Speech Separation by Humans and Machines.

[11]  Douglas L. Jones,et al.  Unitary equivalence: a new twist on signal processing , 1995, IEEE Trans. Signal Process..

[12]  John T. Scott,et al.  Fundamentals of musical acoustics , 1976 .

[13]  Davide Rocchesso,et al.  Auditory perception of 3D size: Experiments with synthetic resonators , 2004, TAP.

[14]  R. Altes The Fourier-Mellin transform and mammalian hearing. , 1978, The Journal of the Acoustical Society of America.

[15]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[16]  R. T. Schumacher,et al.  ON THE OSCILLATIONS OF MUSICAL-INSTRUMENTS , 1983 .

[17]  D. A. Luce,et al.  Physical Correlates of Brass‐Instrument Tones , 1967 .

[18]  W. Fitch,et al.  Morphology and development of the human vocal tract: a study using magnetic resonance imaging. , 1999, The Journal of the Acoustical Society of America.

[19]  Roy D. Patterson,et al.  A Dynamic Compressive Gammachirp Auditory Filterbank , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  R. Patterson,et al.  The perception of size in musical instrument sounds , 2005 .

[21]  B. Moore,et al.  A Model of Loudness Applicable to Time-Varying Sounds , 2002 .

[22]  Roy D. Patterson,et al.  The sound of a sinusoid: Spectral models , 1994 .

[23]  Roy D. Patterson,et al.  Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform , 2002, Speech Commun..

[24]  S. Schwerman,et al.  The Physics of Musical Instruments , 1991 .

[25]  Leon Cohen,et al.  The scale representation , 1993, IEEE Trans. Signal Process..

[26]  Diane Kewley-Port,et al.  STRAIGHT: A new speech synthesizer for vowel formant discrimination , 2004 .

[27]  R D Patterson,et al.  Modeling temporal asymmetry in the auditory system. , 1998, The Journal of the Acoustical Society of America.

[28]  Peter F Assmann,et al.  Synthesis fidelity and time-varying spectral change in vowels. , 2005, The Journal of the Acoustical Society of America.

[29]  D. Reby,et al.  The descended larynx is not uniquely human , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[30]  Roy D Patterson,et al.  Discrimination of speaker size from syllable phrases. , 2005, Journal of the Acoustical Society of America.

[31]  D. Reby,et al.  Anatomical constraints generate honesty: acoustic cues to age and weight in the roars of red deer stags , 2003, Animal Behaviour.

[32]  W. Fitch,et al.  Perception of Vocal Tract Resonances by Whooping Cranes Grus americana , 2000 .

[33]  Hermann Ney,et al.  Speaker adaptive modeling by vocal tract normalization , 2002, IEEE Trans. Speech Audio Process..

[34]  W. Tecumseh,et al.  Vocal Tract Length Perception and the Evolution of Language , 1994 .

[35]  N. Fletcher Mode locking in nonlinearly excited inharmonic musical oscillators , 1978 .

[36]  R. Patterson,et al.  The lower limit of melodic pitch. , 2001, The Journal of the Acoustical Society of America.

[37]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[38]  W A Yost,et al.  A time domain description for the pitch strength of iterated rippled noise. , 1996, The Journal of the Acoustical Society of America.

[39]  Dik J. Hermes,et al.  Perception of the size and speed of rolling balls by sound , 2004, Speech Commun..

[40]  Carleen Maley Hutchins,et al.  Foundíng a famíly of fíddles , 1967 .

[41]  N. Fletcher,et al.  The nonlinear physics of musical instruments , 1999 .

[42]  T. Cornsweet,et al.  Luminance discrimination of brief flashes under various conditions of adaptation , 1965, The Journal of physiology.

[43]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[44]  Ray Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .

[45]  Roy D. Patterson,et al.  The sound of a sinusoid: Time‐interval models , 1994 .

[46]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[47]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[48]  Hast Mh,et al.  The larynx of roaring and non-roaring cats. , 1989 .

[49]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[50]  R. Patterson,et al.  Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. , 1995, The Journal of the Acoustical Society of America.

[51]  C. Hutchins The New Violin Family , 1965 .

[52]  T. Riede,et al.  Vocal tract length and acoustics of vocalization in the domestic dog (Canis familiaris). , 1999, The Journal of experimental biology.

[53]  M. Grassi Do we hear size or sound? Balls dropped on plates , 2005, Perception & psychophysics.

[54]  Richard E. Turner,et al.  The processing and perception of size information in speech sounds. , 2005, The Journal of the Acoustical Society of America.

[55]  A. H. Benade,et al.  The saxophone spectrum , 1988 .

[56]  Douglas L. Jones,et al.  Warped wavelet bases: unitary equivalence and signal processing , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[57]  D. Perrett,et al.  Manipulations of fundamental and formant frequencies influence the attractiveness of human male voices , 2005, Animal Behaviour.