Seeking Temporal Predictability in Speech: Comparing Statistical Approaches on 18 World Languages

Temporal regularities in speech, such as interdependencies in the timing of speech events, are thought to scaffold early acquisition of the building blocks in speech. By providing on-line clues to the location and duration of upcoming syllables, temporal structure may aid segmentation and clustering of continuous speech into separable units. This hypothesis tacitly assumes that learners exploit predictability in the temporal structure of speech. Existing measures of speech timing tend to focus on first-order regularities among adjacent units, and are overly sensitive to idiosyncrasies in the data they describe. Here, we compare several statistical methods on a sample of 18 languages, testing whether syllable occurrence is predictable over time. Rather than looking for differences between languages, we aim to find across languages (using clearly defined acoustic, rather than orthographic, measures), temporal predictability in the speech signal which could be exploited by a language learner. First, we analyse distributional regularities using two novel techniques: a Bayesian ideal learner analysis, and a simple distributional measure. Second, we model higher-order temporal structure—regularities arising in an ordered series of syllable timings—testing the hypothesis that non-adjacent temporal structures may explain the gap between subjectively-perceived temporal regularities, and the absence of universally-accepted lower-order objective measures. Together, our analyses provide limited evidence for predictability at different time scales, though higher-order predictability is difficult to reliably infer. We conclude that temporal predictability in speech may well arise from a combination of individually weak perceptual cues at multiple structural levels, but is challenging to pinpoint.

[1]  E. Zee Chinese (Hong Kong Cantonese) , 1991, Journal of the International Phonetic Association.

[2]  Richard H. Jones,et al.  Maximum Likelihood Fitting of ARMA Models to Time Series With Missing Observations , 1980 .

[3]  A. U.S.,et al.  Predictability , Complexity , and Learning , 2002 .

[4]  D J Povel,et al.  A theoretical framework for rhythm perception , 1984, Psychological research.

[5]  Peter Grosche,et al.  Extracting Predominant Local Pulse Information From Music Recordings , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  J. Liss,et al.  Discriminating dysarthria type from envelope modulation spectra. , 2010, Journal of speech, language, and hearing research : JSLHR.

[7]  Alexandra Jesse,et al.  Speaking Rate Affects the Perception of Duration as a Suprasegmental Lexical-stress Cue , 2011, Language and speech.

[8]  Timothy D Griffiths,et al.  Temporal predictions based on a gradual change in tempo. , 2012, The Journal of the Acoustical Society of America.

[9]  Aniruddh D. Patel,et al.  An empirical comparison of rhythm in language and music , 2003, Cognition.

[10]  Amalia Arvaniti,et al.  The usefulness of metrics in the quantification of speech rhythm , 2012, J. Phonetics.

[11]  Richard A. Davis,et al.  Time Series: Theory and Methods (2Nd Edn) , 1993 .

[12]  Carel ten Cate,et al.  Zebra finches are sensitive to prosodic features of human speech , 2014, Proceedings of the Royal Society B: Biological Sciences.

[13]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[14]  Sam Tilsena,et al.  Speech rhythm analysis with decomposition of the amplitude envelope: Characterizing rhythmic patterns within and across languages , 2013 .

[15]  Bart de Boer,et al.  Language Dynamics in Structured Form and Meaning Spaces , 2012, Adv. Complex Syst..

[16]  R. M. Dauer Stress-timing and syllable-timing reanalyzed. , 1983 .

[17]  J. Tenenbaum,et al.  Probabilistic models of cognition: exploring representations and inductive biases , 2010, Trends in Cognitive Sciences.

[18]  S E Trehub,et al.  Infants' perception of rhythm: categorization of auditory sequences by temporal structure. , 1989, Canadian journal of psychology.

[19]  B. Pompino-Marschall Acoustic Determinants of Auditory Rhythm and Tempo Perception , 1988, Proceedings of the 1988 IEEE International Conference on Systems, Man, and Cybernetics.

[20]  K. Pike,et al.  The intonation of American English , 1946 .

[21]  Rob J Hyndman,et al.  Automatic Time Series Forecasting: The forecast Package for R , 2008 .

[22]  Gary Lupyan,et al.  Vocal Charades: The Emergence of Conventions in Vocal Communication , 2014, CogSci.

[23]  W. Fitch,et al.  More than one way to see it: Individual heuristics in avian visual computation , 2015, Cognition.

[24]  L J Trainor,et al.  Infants’ and adults’ use of duration and intensity cues in the segmentation of tone patterns , 2000, Perception & psychophysics.

[25]  Dharmesh Patel Rhythm , 1919, The Craft of Poetry.

[26]  H. Levin,et al.  The Prosodic and Paralinguistic Features of Reading and Telling Stories , 1982 .

[27]  Jerald B. Johnson,et al.  Model selection in ecology and evolution. , 2004, Trends in ecology & evolution.

[28]  Susan Robinson Dutch , 2007, Cheers!.

[29]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[30]  Nivja H. Jong,et al.  Praat script to detect syllable nuclei and measure speech rate automatically , 2009, Behavior research methods.

[31]  AN Kolmogorov-Smirnov,et al.  Sulla determinazione empírica di uma legge di distribuzione , 1933 .

[32]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[33]  Elizabeth K. Johnson,et al.  Language Discrimination by English-Learning 5-Month-Olds: Effects of Rhythm and Familiarity , 2000 .

[34]  W. Tecumseh Fitch,et al.  Chorusing, synchrony, and the evolutionary functions of rhythm , 2014, Front. Psychol..

[35]  E. Wagenmakers,et al.  AIC model selection using Akaike weights , 2004, Psychonomic bulletin & review.

[36]  Daniel Recasens On the articulatory classification of (alveolo)palatal consonants , 2013, Journal of the International Phonetic Association.

[37]  Donia Scott,et al.  Perceptual isochrony in English and in French , 1985 .

[38]  A. Fernald,et al.  Prosody and focus in speech to infants and adults , 1991 .

[39]  Geert Booij,et al.  A grid theory of stress in Polish , 1985 .

[40]  Marina Nespor,et al.  Experience-dependent emergence of a grouping bias , 2015, Biology Letters.

[41]  Morris Halle,et al.  Grouping in the stressing of words, in metrical verse, and in music , 2011 .

[42]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[43]  A. Fernald Four-Month-Old Infants Prefer to Listen to Motherese" , 1985 .

[44]  E. Grabe,et al.  Durational variability in speech and the rhythm class hypothesis , 2005 .

[45]  U. Goswami,et al.  Speech rhythm and temporal structure: Converging perspectives? , 2013 .

[46]  Jun Zhang,et al.  LONG RANGE CORRELATION IN HUMAN WRITINGS , 1993 .

[47]  Dirk Heylen,et al.  Generating expressive speech for storytelling applications , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[48]  Ernst Heinrich Weber,et al.  De pulsu, resorptione, auditu et tactu. Annotationes anatomicae et physiologicae , 1834 .

[49]  David Gil,et al.  The World Atlas of Language Structures , 2005 .

[50]  Anastassia Loukina,et al.  Rhythm measures and dimensions of durational variation in speech. , 2011, The Journal of the Acoustical Society of America.

[51]  Marina Nespor,et al.  Do humans and nonhuman animals share the grouping principles of the iambic–trochaic law? , 2012, Attention, Perception, & Psychophysics.

[52]  Piera Filippi,et al.  Pitch enhancement facilitates word learning across visual contexts , 2014, Front. Psychol..

[53]  Sandra E. Trehub,et al.  Duration Illusion and Auditory Grouping in Infancy , 1989 .

[54]  Caroline L. Smith Handbook of the International Phonetic Association: a guide to the use of the International Phonetic Alphabet (1999). Cambridge: Cambridge University Press. Pp. ix+204. , 2000, Phonology.

[55]  Timothy D. Griffiths,et al.  Exploring the role of auditory analysis in atypical compared to typical language development , 2014, Hearing Research.

[56]  Antonio Romano,et al.  Measures of Speech Rhythm in East-Asian Tonal Languages , 2011, ICPhS.

[57]  R. Port,et al.  Evidence for mora timing in Japanese. , 1987, The Journal of the Acoustical Society of America.

[58]  Víctor Leiva,et al.  An R Package for a General Class of Inverse Gaussian Distributions , 2008 .

[59]  P. Boersma Praat : doing phonetics by computer (version 5.1.05) , 2009 .

[60]  Erin O'Rourke,et al.  Tena Quichua , 2013, Journal of the International Phonetic Association.

[61]  N. Smirnov Table for Estimating the Goodness of Fit of Empirical Distributions , 1948 .

[62]  F. Ramus,et al.  Language discrimination by human newborns and by cotton-top tamarin monkeys. , 2000, Science.

[63]  J. Saffran,et al.  The Infant's Auditory World: Hearing, Speech, and the Beginnings of Language , 2007 .

[64]  W. Fitch,et al.  The biology and evolution of rhythm: unravelling a paradox , 2011 .

[65]  V. Menon,et al.  Musical rhythm spectra from Bach to Joplin obey a 1/f power law , 2012, Proceedings of the National Academy of Sciences.

[66]  D. Titone,et al.  Speech rates converge in scripted turn-taking conversations , 2015, Applied Psycholinguistics.

[67]  Arthur S. Abramson,et al.  Illustrations of the IPA: Thai , 1993 .

[68]  Francesc Alías,et al.  Prosodic analysis of storytelling discourse modes and narrative situations oriented to text-to-speech synthesis , 2013, SSW.

[69]  J. Devin McAuley Tempo and Rhythm , 2010 .

[70]  Aniruddh D. Patel Musical Rhythm, Linguistic Rhythm, and Human Evolution , 2006 .

[71]  S. Kirby,et al.  Culture shapes the evolution of cognition , 2016, Proceedings of the National Academy of Sciences.

[72]  Robert F. Port,et al.  Rhythmic constraints on stress timing in English , 1998 .

[73]  P. Kuhl,et al.  Acoustic determinants of infant preference for motherese speech , 1987 .

[74]  Sam Tilsen,et al.  Speech rhythm analysis with decomposition of the amplitude envelope: characterizing rhythmic patterns within and across languages. , 2013, The Journal of the Acoustical Society of America.

[75]  S. Grondin Timing and time perception: A review of recent behavioral and neuroscience findings and theoretical directions , 2010, Attention, perception & psychophysics.

[76]  Massimiliano Di Luca,et al.  Temporal Regularity of the Environment Drives Time Perception , 2016, PloS one.

[77]  Sam Tilsen,et al.  Multitimescale Dynamical Interactions Between Speech Rhythm and Gesture , 2009, Cogn. Sci..

[78]  D Delignières,et al.  Degeneracy and long-range correlations. , 2013, Chaos.

[79]  C. Drake,et al.  The “Ticktock” of Our Internal Clock , 2003, Psychological science.

[80]  Franck Ramus,et al.  Perception and acquisition of linguistic rhythm by infants , 2003, Speech Commun..

[81]  Christophe d'Alessandro,et al.  Prosodic Analysis of a Corpus of Tales , 2011, INTERSPEECH.

[82]  W. Tecumseh Fitch,et al.  Phonological perception by birds: budgerigars can perceive lexical stress , 2016, Animal Cognition.

[83]  J. B. Trobalon,et al.  The use of prosodic cues in language discrimination tasks by rats , 2003, Animal Cognition.

[84]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[85]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[86]  F. Ramus,et al.  Correlates of linguistic rhythm in the speech signal , 1999, Cognition.

[87]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[88]  A. Fernald,et al.  A cross-language study of prosodic modifications in mothers' and fathers' speech to preverbal infants , 1989, Journal of Child Language.

[89]  Bart de Boer Modeling Co-evolution of Speech and Biology , 2016, Top. Cogn. Sci..

[90]  Stefan Weinzierl,et al.  Using the beat histogram for speech rhythm description and language identification , 2015, INTERSPEECH.

[91]  James P. Kirby Vietnamese (Hanoi Vietnamese) , 2011, Journal of the International Phonetic Association.

[92]  S. P. Corder THE SIGNIFICANCE OF LEARNER'S ERRORS , 1967 .

[93]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[94]  Simon Kirby,et al.  Emergence of combinatorial structure and economy through iterated learning with continuous acoustic signals , 2014, J. Phonetics.