Supervised and unsupervised feature selection for inferring social nature of telephone conversations from their content

The ability to reliably infer the nature of telephone conversations opens up a variety of applications, ranging from designing context-sensitive user interfaces on smartphones, to providing new tools for social psychologists and social scientists to study and understand social life of different subpopulations within different contexts. Using a unique corpus of everyday telephone conversations collected from eight residences over the duration of a year, we investigate the utility of popular features, extracted solely from the content, in classifying business-oriented calls from others. Through feature selection experiments, we find that the discrimination can be performed robustly for a majority of the calls using a small set of features. Remarkably, features learned from unsupervised methods, specifically latent Dirichlet allocation, perform almost as well as with as those from supervised methods. The unsupervised clusters learned in this task shows promise of finer grain inference of social nature of telephone conversations.

[1]  J. Pennebaker,et al.  Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[2]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  M. Mehl,et al.  Eavesdropping on Happiness , 2010, Psychological science.

[4]  Andreas Stolcke,et al.  Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech? , 1998, Language and speech.

[5]  Geoffrey Zweig,et al.  The IBM 2004 conversational telephony system for rich transcription , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Mehryar Mohri,et al.  Voice signatures , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[7]  Herbert Gish,et al.  Approaches to topic identification on the switchboard corpus , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[9]  Mehryar Mohri,et al.  A comparison of classifiers for detecting emotion from speech , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Janet M. Baker,et al.  Application of large vocabulary continuous speech recognition to topic and speaker identification using telephone speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  L. Berkman,et al.  Social Disengagement and Incident Cognitive Decline in Community-Dwelling Elderly Persons , 1999, Annals of Internal Medicine.

[14]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[15]  A. Stolcke,et al.  Automatic detection of discourse structure for speech recognition and understanding , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[16]  L. Azzopardi,et al.  Topic based language models for ad hoc information retrieval , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[17]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..