Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors.

Between-speaker variability of acoustically measurable speech rhythm [%V, ΔV(ln), ΔC(ln), and Δpeak(ln)] was investigated when within-speaker variability of (a) articulation rate and (b) linguistic structural characteristics was introduced. To study (a), 12 speakers of Standard German read seven lexically identical sentences under five different intended tempo conditions (very slow, slow, normal, fast, very fast). To study (b), 16 speakers of Zurich Swiss German produced 16 spontaneous utterances each (256 in total) for which transcripts were made and then read by all speakers (4096 sentences; 16 speaker × 256 sentences). Between-speaker variability was tested using analysis of variance with repeated measures on within-speaker factors. Results revealed strong and consistent between-speaker variability while within-speaker variability as a function of articulation rate and linguistic characteristics was typically not significant. It was concluded that between-speaker variability of acoustically measurable speech rhythm is strong and robust against various sources of within-speaker variability. Idiosyncratic articulatory movements were found to be the most plausible factor explaining between-speaker differences.

[1]  R. M. Dauer Stress-timing and syllable-timing reanalyzed. , 1983 .

[2]  Amalia Arvaniti,et al.  The usefulness of metrics in the quantification of speech rhythm , 2012, J. Phonetics.

[3]  Petra Wagner,et al.  Relations between language rhythm and speech rate , 2003 .

[4]  Jacqueline Vaissière,et al.  Language-Independent Prosodic Features , 1983 .

[5]  P Howell,et al.  Prediction of P-center location from the distribution of energy in the amplitude envelope: I , 1988, Perception & psychophysics.

[6]  Francis Nolan,et al.  The Phonetic Bases of Speaker Recognition , 1983 .

[7]  M. Shiffrar,et al.  Recognizing people from their movement. , 2005, Journal of experimental psychology. Human perception and performance.

[8]  Andreas Stolcke,et al.  Modeling prosodic feature sequences for speaker recognition , 2005, Speech Commun..

[9]  Jürgen Trouvain,et al.  The Effect of Tempo on Prosodic Structure , 1999 .

[10]  J. Morton,et al.  Perceptual centers (P-centers). , 1976 .

[11]  V. Dellwo Rhythm and Speech Rate: A Variation Coefficient for deltaC , 2006 .

[12]  F. Ramus Language discrimination by newborns: Teasing apart phonotactic, rhythmic, and intonational cues , 2002 .

[13]  M. Nixon,et al.  Automated Human Recognition by Gait using Neural Network , 2008, 2008 First Workshops on Image Processing Theory, Tools and Applications.

[14]  J. Mehler,et al.  Language discrimination by newborns: toward an understanding of the role of rhythm. , 1998, Journal of experimental psychology. Human perception and performance.

[15]  Sam Tilsen,et al.  Low-frequency Fourier analysis of speech rhythm. , 2008, The Journal of the Acoustical Society of America.

[16]  Harry Hollien,et al.  Speaker indentification utilizing selected temporal speech features , 1984 .

[17]  Mark Huckvale,et al.  How Is Individuality Expressed in Voice? An Introduction to Speech Production and Description for Speaker Classification , 2007, Speaker Classification.

[18]  Rheinischen Friedrich-Wilhelms-Universität Influences of speech rate on the acoustic correlates of speech rhythm: An experimental phonetic study based on acoustic and perceptual evidence , 2010 .

[19]  Tommi Nieminen,et al.  COUPLED OSCILLATOR MODEL OF SPEECH RHYTHM , 1999 .

[20]  Sam Tilsen,et al.  Speech rhythm analysis with decomposition of the amplitude envelope: characterizing rhythmic patterns within and across languages. , 2013, The Journal of the Acoustical Society of America.

[21]  Elvira Mendoza,et al.  Temporal variability in speech segments of Spanish: context and speaker related differences , 2003, Speech Commun..

[22]  P. Lieberman,et al.  Measures of the sentence intonation of read and spontaneous speech in American English. , 1985, The Journal of the Acoustical Society of America.

[23]  F. Ramus,et al.  Correlates of linguistic rhythm in the speech signal , 1999, Cognition.

[24]  Pascal Perrier,et al.  Gesture planning integrating knowledge of the motor plant's dynamics: A literature review from motor control and speech motor control , 2012 .

[25]  V. Dellwo,et al.  Rhythmic variability in Swiss German dialects , 2012 .

[26]  C. Fougeron,et al.  Rate effects on French intonation: prosodic organization and phonetic realization , 1998 .

[27]  Bert Cranen,et al.  Methodological aspects of segment- and speaker-related variability. A study of segmental durations in Dutch , 1994 .

[28]  F. Ramus,et al.  Language identification with suprasegmental cues: a study based on speech resynthesis. , 1999, The Journal of the Acoustical Society of America.

[29]  Barbara Schuppler,et al.  How stable are acoustic metrics of contrastive speech rhythm? , 2010, The Journal of the Acoustical Society of America.

[30]  Francis Nolan Intonation in speaker identification: an experiment on pitch alignment features , 2002 .

[31]  Kirsty McDougall,et al.  Dynamic features of speech and the characterization of speakers: Toward a new approach using formant frequencies , 2006 .

[32]  Laurence White,et al.  Calibrating rhythm: First language and second language studies , 2007, J. Phonetics.

[33]  Kirsty McDougall,et al.  Speaker-specific formant dynamics: An experiment on Australian English /aI/ , 2004 .

[34]  Peter Howell,et al.  Comparison of prosodic properties between read and spontaneous speech material , 1991, Speech Commun..

[35]  Petra Wagner,et al.  Bonntempo-corpus and bonntempo-tools: a database for the study of speech rhythm and rate , 2004, INTERSPEECH.

[36]  E. Grabe,et al.  Durational variability in speech and the rhythm class hypothesis , 2005 .

[37]  D. O'Shaughnessy A multispeaker analysis of durations in read French paragraphs , 1984 .

[38]  Anastassia Loukina,et al.  Rhythm measures and dimensions of durational variation in speech. , 2011, The Journal of the Acoustical Society of America.

[39]  Adrian Leemann,et al.  Speaker-individuality in suprasegmental temporal features: Implications for forensic voice comparison. , 2014, Forensic science international.

[40]  Stephan Schmid,et al.  Choosing the right rate normalization method for measurements of speech rhythm , 2009 .

[41]  Vincent J. van Heuven,et al.  Effects of time pressure on the choice of accent-lending and boundary-marking pitch configurations in dutch , 1995, EUROSPEECH.

[42]  Pilar Prieto,et al.  Phonotactic and phrasal properties of speech rhythm. Evidence from Catalan, English, and Spanish , 2012, Speech Commun..

[43]  J. Morton,et al.  Perceptual centers (P-centers). , 1976 .

[44]  Adrian Leemann,et al.  Speaker idiosyncratic rhythmic features in the speech signal , 2012, INTERSPEECH.

[45]  François Pellegrino,et al.  Rhythm in read british English: interdialect variability , 2004, INTERSPEECH.