Detection of Demographics and Identity in Spontaneous Speech and Writing

This chapter focuses on the automatic identification of demographic traits and identity in both speech and writing. We address language use in the virtual world of online games and text entry on mobile devices in the form of chat, email and nicknames, and demonstrate text factors that correlate with demographics, such as age, gender, personality, and interaction style. Also presented here is work on speakers identification in spontaneous language use, where we describe the state of the art in verification, feature extraction, modeling and calibration across multiple environmental conditions. Finally, we bring speech and writing together to explore approaches to user authentication that span language in general. We discuss how speech-specific factors such as intonation, and writing-specific features such as spelling, punctuation, and typing correction correlate and predict one another as a function of users’ sociolinguistic characteristics.

[1]  Yun Lei,et al.  A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Wen Wang,et al.  Automatic Detection of Speaker Attributes Based on Utterance Text , 2011, INTERSPEECH.

[3]  John C. Paolillo,et al.  Gender and genre variation in weblogs , 2006 .

[4]  N. Brummer,et al.  On calibration of language recognition scores , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[5]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[6]  M. K. Ching The Question Intonation in Assertions , 1982 .

[7]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Yun Lei,et al.  Towards noise-robust speaker recognition using probabilistic linear discriminant analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  Yun Lei,et al.  Improving speaker identification robustness to highly channel-degraded speech through multiple system fusion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  John H. L. Hansen,et al.  Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  J. Ohala Sound Symbolism , 2004, Encyclopedia of Slavic Languages and Linguistics Online.

[13]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  D. Tannen Conversational Style: Analyzing Talk Among Friends , 1984 .

[15]  C. Whissell Using the Revised Dictionary of Affect in Language to Quantify the Emotional Undertones of Samples of Natural Language , 2009, Psychological reports.

[16]  L. Burget,et al.  Promoting robustness for speaker modeling in the community: the PRISM evaluation set , 2011 .

[17]  Daniel P. W. Ellis,et al.  Noise Robust Pitch Tracking by Subband Autocorrelation Classification , 2012, INTERSPEECH.

[18]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[19]  Lukás Burget,et al.  A unified approach for audio characterization and its application to speaker recognition , 2012, Odyssey.

[20]  D. Tannen Gender and discourse , 1994 .

[21]  R. Lakoff,et al.  Language and woman's place , 1973, Language in Society.

[22]  Richard M. Stern,et al.  Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  David A. van Leeuwen,et al.  Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Kevin Walker,et al.  The RATS radio traffic collection system , 2012, Odyssey.

[26]  Peter A. Flach,et al.  Confirmation-Guided Discovery of First-Order Rules with Tertius , 2004, Machine Learning.

[27]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[28]  Richard M. Stern,et al.  Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Arindam Mandal,et al.  Normalized amplitude modulation features for large vocabulary noise-robust speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Aaron Lawson,et al.  Identifying User Demographic Traits Through Virtual-World Language Use , 2013, Predicting Real World Behaviors from Virtual World Data @ SocialCom.

[31]  Yun Lei,et al.  Improving robustness to compressed speech in speaker recognition , 2013, INTERSPEECH.

[32]  Lukás Burget,et al.  iVector Fusion of Prosodic and Cepstral Features for Speaker Verification , 2011, INTERSPEECH.

[33]  Aaron Lawson,et al.  Socio-Linguistic Factors and Gender Mapping Across Real and Virtual World Cultures , 2012 .

[34]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .