SocialSense: A Collaborative Mobile Platform for Speaker and Mood Identification

We present SocialSense , a collaborative smartphone based speaker and mood identification and reporting system that uses a user’s voice to detect and log his/her speaking and mood episodes. SocialSense collaboratively works with other phones that are running the app present in the vicinity to periodically send/receive speaking and mood vectors to/from other users present in a social interaction setting, thus keeping track of the global speaking episodes of all users with their mood. In addition, it utilizes a novel event-adaptive dynamic classification scheme for speaker identification which updates the speaker classification model every time one or more users enter or leave the scenario, ensuring a most updated classifier based on user presence. Evaluation of using dynamic classifiers shows that SocialSense improves speaker identification accuracy by 30% compared to traditional static speaker identification systems, and a 10% to 43% performance boost under various noisy environments. SocialSense also improves the mood classification accuracy by 4% to 20% compared to the baseline approaches. Energy consumption experiments show that its device daily lifetime is between 10-14 hours.

[1]  Jun Li,et al.  Crowd++: unsupervised speaker count with smartphones , 2013, UbiComp.

[2]  Mirco Musolesi,et al.  Sensing meets mobile social networks: the design, implementation and evaluation of the CenceMe application , 2008, SenSys '08.

[3]  Qiang Li,et al.  Multi-modal in-person interaction monitoring using smartphone and on-body sensors , 2013, 2013 IEEE International Conference on Body Sensor Networks.

[4]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[5]  F. Strack,et al.  "Mood contagion": the automatic transfer of mood between persons. , 2000, Journal of personality and social psychology.

[6]  Stephen R. Gulliver,et al.  Introduction to special issue on eye-tracking applications in multimedia systems , 2007, TOMCCAP.

[7]  Shrikanth S. Narayanan,et al.  An exploratory study of manifolds of emotional speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Jie Liu,et al.  SpeakerSense: Energy Efficient Unobtrusive Speaker Identification on Mobile Phones , 2011, Pervasive.

[9]  Richard G. Stefanacci,et al.  How Big an Issue Is Depression inAssisted Living , 2008 .

[10]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[11]  Cecilia Mascolo,et al.  EmotionSense: a mobile phones based adaptive platform for experimental social psychology research , 2010, UbiComp.

[12]  Datong Chen,et al.  Detecting social interactions of the elderly in a nursing home environment , 2007, TOMCCAP.

[13]  Alex Pentland,et al.  Sensing and modeling human networks , 2004 .

[14]  Mun Choon Chan,et al.  SocialWeaver: collaborative inference of human conversation networks using smartphones , 2013, SenSys '13.

[15]  Zhigang Liu,et al.  Darwin phones: the evolution of sensing and inference on mobile phones , 2010, MobiSys '10.

[16]  Petia Radeva,et al.  Face-to-Face Social Activity Detection Using Data Collected with a Wearable Device , 2009, IbPRIA.

[17]  Allison Woodruff,et al.  Detecting user engagement in everyday conversations , 2004, INTERSPEECH.

[18]  Toyoaki Nishida,et al.  Neary: conversation field detection based on similarity of auditory situation , 2009, HotMobile '09.