Acoustic Classification of Focus: On the Web and in the Lab

We present a new methodological approach which combines both naturally-occurring speech harvested on the web and speech data elicited in the laboratory. This proof-of-concept study examines the phenomenon of focus sensitivity in English, in which the interpretation of particular grammatical constructions (e.g., the comparative) is sensitive to the location of prosodic prominence. Machine learning algorithms (support vector machines and linear discriminant analysis) and human perception experiments are used to cross-validate the web-harvested and lab-elicited speech. Results confirm the theoretical predictions for location of prominence in comparative clauses and the advantages using both web-harvested and lab-elicited speech. The most robust acoustic classifiers include paradigmatic (i.e., un-normalized), non-intonational acoustic measures (duration and relative formant frequencies from single segments). These acoustic cues are also significant predictors of human listeners’ classification, offering new evidence in the debate whether prominence is mainly encoded by pitch or by other cues, and the role that utterance-normalization plays when looking at non-pitch cues such as duration.

[1]  Mark Liberman,et al.  The intonational system of English , 1979 .

[2]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[3]  G. Bruce Swedish word accents in sentence perspective , 1977 .

[4]  Michael T. Brown,et al.  8 – Discriminant Analysis , 2000 .

[5]  Lennart Nord,et al.  Durational correlates of stress in Swedish, French and English* , 1991 .

[6]  D. Fry Experiments in the Perception of Stress , 1958 .

[7]  D. Bolinger Two kinds of vowels, two kinds of rhythm , 1981 .

[8]  P. Ladefoged Three areas of experimental phonetics , 1967 .

[9]  Gregory Grefenstette,et al.  The World Wide Web as a Resource for Example-Based Machine Translation Tasks , 1999, TC.

[10]  Nickolas Savarimuthu,et al.  Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection , 2009, IC3.

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Carl J. Huberty,et al.  Applied MANOVA and discriminant analysis , 2006 .

[13]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[14]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[15]  Y. Mo Prosody production and perception with conversational speech , 2010 .

[16]  Edward R. Dougherty,et al.  Relation Between Permutation-Test P Values and Classifier Error Estimates , 2004, Machine Learning.

[17]  Mats Rooth,et al.  Harvesting speech datasets for linguistic research on the web , 2013 .

[18]  Roger Schwarzschild,et al.  GIVENNESS, AVOIDF AND OTHER CONSTRAINTS ON THE PLACEMENT OF ACCENT* , 1999 .

[19]  Tormod Næs,et al.  Understanding the collinearity problem in regression and discriminant analysis , 2001 .

[20]  Mattias Heldner,et al.  On the reliability of overall intensity and spectral emphasis as acoustic correlates of focal accents in Swedish , 2003, J. Phonetics.

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[23]  J. Pierrehumbert,et al.  Intonational structure in Japanese and English , 1986, Phonology.

[24]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[25]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[26]  David D. Jensen,et al.  Induction with randomization testing: decision-oriented analysis of large data sets , 1992 .

[27]  Milos Hauskrecht,et al.  ORIGINAL RESEARCH Assessing the Statistical Significance of the Achieved Classification Error of Classifiers Constructed using Serum Peptide Profiles, and a Prescription for Random Sampling Repeated Studies for Massive , 2022 .

[28]  Mats Rooth Alternative Semantics , 2016 .

[29]  K. Strimmer,et al.  Feature selection in omics prediction problems using cat scores and false nondiscovery rate control , 2009, 0903.2003.

[30]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[31]  David I. Beaver,et al.  Sense and Sensitivity: How Focus Determines Meaning , 2008 .

[32]  Dwight L. Bolinger,et al.  Stress and Information , 1958 .

[33]  Mats Rooth On the Interface Principles for Intonational Focus , 1996 .

[34]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[35]  Alice Turk,et al.  Acoustic segment durations in prosodic research: a practical guide , 2006 .

[36]  B. Partee Topic, Focus and Quantification , 1991 .

[37]  Beckman,et al.  Phonological Structure and Phonetic Form: Articulatory evidence for differentiating stress categories , 1994 .

[38]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[39]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[40]  N. Kadmon,et al.  Formal Pragmatics: Semantics, Pragmatics, Presupposition, and Focus , 2001 .

[41]  Mats Rooth,et al.  Representing Focus Scoping over New , 2015 .

[42]  D. Crystal,et al.  Intonation and Grammar in British English , 1967 .

[43]  Elisabeth Selkirk,et al.  The Prosodic Structure of Function Words , 2008 .

[44]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[45]  Mats Rooth,et al.  Second Occurrence Focus and Relativized Stress F , 2009 .

[46]  Elisabeth Selkirk,et al.  Sentence Prosody: Intonation, Stress and Phrasing , 1996 .

[47]  Kyle Gorman,et al.  Prosodylab-aligner: A tool for forced alignment of laboratory speech , 2011 .

[48]  Carlos Gussenhoven,et al.  Sentence accents and argument structure , 1992 .

[49]  C. Gussenhoven The phonology of tone and intonation , 2004 .

[50]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[51]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[52]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[53]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[54]  Mats Rooth Notions of Focus Anaphoricity , 2008 .

[55]  Hsuan-Tien Lin A Study on Sigmoid Kernels for SVM and the Training of non-PSD Kernels by SMO-type Methods , 2005 .

[56]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[57]  Heinz J. Giegerich Metrical Phonology and Phonological Structure: German and English , 1985 .

[58]  Mats Rooth,et al.  Web Harvest of Minimal Intonational Pairs , 2009 .

[59]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[60]  Jill P. Mesirov,et al.  Support Vector Machine Classification of Microarray Data , 2001 .

[61]  Greg Kochanski,et al.  Prosody Beyond Fundamental Frequency , 2006 .

[62]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[63]  H. B. Drubig Toward a typology of focus and focus constructions , 2003 .

[64]  Adam Kilgarriff Googleology is Bad Science , 2007, Computational Linguistics.

[65]  E. Gibson,et al.  Please Scroll down for Article Language and Cognitive Processes Acoustic Correlates of Information Structure Acoustic Correlates of Information Structure , 2022 .

[66]  P. Ladefoged A course in phonetics , 1975 .

[67]  Marcos Dipinto,et al.  Discriminant analysis , 2020, Predictive Analytics.

[68]  David I. Beaver,et al.  When Semantics Meets Phonetics: Acoustical Studies of Second-Occurrence Focus , 2007 .

[69]  J. Friedman Regularized Discriminant Analysis , 1989 .

[70]  Yi Xu,et al.  On the Temporal Domain of Focus , 2004 .

[71]  B. Hayes A metrical theory of stress rules , 1980 .

[72]  Mattias Heldner,et al.  A focus detector using overall intensity and high frequency emphasis , 1999 .

[73]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[74]  Susanne Winkler,et al.  Focus and secondary predication , 1996 .

[75]  E. Williams Blocking and anaphora , 1997 .

[76]  P. Ladefoged,et al.  Binary Suprasegmental Features and Transformational Word-Accentuation Rules. , 1972 .

[77]  Mats Rooth A theory of focus interpretation , 1992, Natural Language Semantics.

[78]  Zellig S. Harris,et al.  Grundzüge der Phonologie@@@Grundzuge der Phonologie , 1941 .

[79]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[80]  Mats Rooth,et al.  A web application for filtering and annotating web speech data , 2013 .

[81]  M. Halle,et al.  An essay on stress , 1987 .

[82]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[83]  G. Loeb,et al.  Preliminary studies on respiratory activity in speech , 2001 .

[84]  Agaath M. C. Sluijter,et al.  Spectral balance as an acoustic correlate of linguistic stress. , 1996, The Journal of the Acoustical Society of America.

[85]  De Jong,et al.  The oral articulation of English stress accent , 1991 .

[86]  A. Prince,et al.  On stress and linguistic rhythm , 1977 .

[87]  Elisabeth Selkirk,et al.  Phonology and Syntax: The Relation between Sound and Structure , 1984 .

[88]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[89]  Jason B. Bishop,et al.  The effect of position on the realization of second occurrence focus , 2008, INTERSPEECH.

[90]  Michael Wagner,et al.  Givenness and Locality , 2006 .

[91]  D. Fry Duration and Intensity as Physical Correlates of Linguistic Stress , 1954 .

[92]  L. Goldstein,et al.  Manifestation of prosodic structure in articulatory variation: Evidence from lip kinematics in English , 2006 .

[93]  R. Gonzalez Applied Multivariate Statistics for the Social Sciences , 2003 .

[94]  Manfred K. Warmuth,et al.  THE CMU SPHINX-4 SPEECH RECOGNITION SYSTEM , 2001 .

[95]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[96]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.