Origins of scale invariance in vocalization sequences and speech

To communicate effectively animals need to detect temporal vocalization cues that vary over several orders of magnitude in their amplitude and frequency content. This large range of temporal cues is evident in the power-law scale-invariant relationship between the power of temporal fluctuations in sounds and the sound modulation frequency (f). Though various forms of scale invariance have been described for natural sounds, the origins and implications of scale invariant phenomenon remain unknown. Using animal vocalization sequences, including continuous human speech, and a stochastic model of temporal amplitude fluctuations we demonstrate that temporal acoustic edges are the primary acoustic cue accounting for the scale invariant phenomenon. The modulation spectrum of vocalization sequences and the model both exhibit a dual regime lowpass structure with a flat region at low modulation frequencies and scale invariant 1/f2 trend for high modulation frequencies. Moreover, we find a time-frequency tradeoff between the average vocalization duration of each vocalization sequence and the cutoff frequency beyond which scale invariant behavior is observed. These results indicate that temporal edges are universal features responsible for scale invariance in vocalized sounds. This is significant since temporal acoustic edges are salient perceptually and the auditory system could exploit such statistical regularities to minimize redundancies and generate compact neural representations of vocalized sounds.

[1]  R. Schwarting,et al.  Maternal care, isolation-induced infant ultrasonic calling, and their relations to adult anxiety-related behavior in the rat. , 2008, Behavioral neuroscience.

[2]  C. Schreiner,et al.  Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. , 2003, Journal of neurophysiology.

[3]  Frédéric E. Theunissen,et al.  The Modulation Transfer Function for Speech Intelligibility , 2009, PLoS Comput. Biol..

[4]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[5]  R. Millane,et al.  Effects of occlusion, edges, and scaling on the power spectra of natural images. , 2005, Journal of the Optical Society of America. A, Optics, image science, and vision.

[6]  Joerg F. Hipp,et al.  Time-Frequency Analysis , 2014, Encyclopedia of Computational Neuroscience.

[7]  Steven L. Miller,et al.  Temporal Processing Deficits of Language-Learning Impaired Children Ameliorated by Training , 1996, Science.

[8]  Judit Gervain,et al.  Auditory Perception of Self-Similarity in Water Sounds , 2011, Front. Integr. Neurosci..

[9]  William Bialek,et al.  Statistics of Natural Images: Scaling in the Woods , 1993, NIPS.

[10]  Khaled H. Hamed,et al.  Time-frequency analysis , 2003 .

[11]  K. Sen,et al.  Feature analysis of natural sounds in the songbird auditory forebrain. , 2001, Journal of neurophysiology.

[12]  Christopher M. Lee,et al.  Neural spike-timing patterns vary with sound shape and periodicity in three auditory cortical fields. , 2016, Journal of neurophysiology.

[13]  S A Shamma,et al.  Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.

[14]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[15]  M. Escabí,et al.  Spectral and temporal modulation tradeoff in the inferior colliculus. , 2010, Journal of neurophysiology.

[16]  N. C. Singh,et al.  Modulation spectra of natural sounds and ethological theories of auditory processing. , 2003, The Journal of the Acoustical Society of America.

[17]  Christopher T. Kello,et al.  Production and Convergence of Multiscale Clustering in Speech , 2015 .

[18]  Daniel L. Ruderman,et al.  Origins of scaling in natural images , 1996, Vision Research.

[19]  Hagai Attias,et al.  Temporal Low-Order Statistics of Natural Sounds , 1996, NIPS.

[20]  T. Irino,et al.  Temporal asymmetry in the auditory system. , 1996, The Journal of the Acoustical Society of America.

[21]  R. Voss,et al.  ‘1/fnoise’ in music and speech , 1975, Nature.

[22]  D. Irvine,et al.  First-spike timing of auditory-nerve fibers and comparison with auditory cortex. , 1997, Journal of neurophysiology.

[23]  Robert C. Liu,et al.  Acoustic variability and distinguishability among mouse ultrasound vocalizations. , 2003, The Journal of the Acoustical Society of America.

[24]  E. Wolf,et al.  Effects of Genetic Background, Gender, and Early Environmental Factors on Isolation-Induced Ultrasonic Calling in Mouse Pups: An Embryo-Transfer Study , 2008, Behavior genetics.

[25]  Steven Greenberg,et al.  Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation , 1999, Speech Commun..

[26]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[27]  David Pfau,et al.  Dead leaves and the dirty ground: low-level image statistics in transmissive and occlusive imaging environments. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  G. Gustafson,et al.  Changes in infants' cries as a function of time in a cry bout. , 1998, Child development.

[29]  M. Escabí,et al.  Distinct Roles for Onset and Sustained Activity in the Neuronal Code for Temporal Periodicity and Acoustic Envelope Shape , 2008, The Journal of Neuroscience.

[30]  T. Dau,et al.  Characterizing frequency selectivity for envelope fluctuations. , 2000, The Journal of the Acoustical Society of America.

[31]  Monty A Escabí,et al.  Neural Modulation Tuning Characteristics Scale to Efficiently Encode Natural Sound Statistics , 2010, The Journal of Neuroscience.

[32]  Aniruddh D. Patel,et al.  Temporal modulations in speech and music , 2017, Neuroscience & Biobehavioral Reviews.

[33]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[34]  Benedikt Grothe,et al.  Efficient Temporal Processing of Naturalistic Sounds , 2008, PloS one.

[35]  Lee M. Miller,et al.  Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. , 2002, Journal of neurophysiology.