Unsupervised speaker segmentation framework based on sparse correlation feature

With the increasing stress in working and studying, mental health becomes a major problem in the current social research. Generally, researchers can analyze psychological health states by using social perception behavior. The speech signal is an important research direction in this domain. It objectively assesses the mental health of social groups through the extraction and fusion of speech features. Thus, this requires an efficient speech segmentation algorithm. In this paper, we present a new framework of speech segmentation algorithm based on the hybrid of sparse correlation feature with Hidden Markov Model (HMM) as well as Kullback-Leibler Divergence (KLD)while it has been proven to gain higher accuracy. Specifically, HMM method can be used to gain the initial wearer's voice data. Experimental tests and comparisons with different segmentation methods have been conducted to verify the efficacy of the proposed unsupervised method. Very promising results have been obtained.

[1]  Andrei Aliaksandrau Secret signals , 2014 .

[2]  Andreas Stolcke,et al.  Artificial neural network features for speaker diarization , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[3]  Tanzeem Choudhury,et al.  Tracking Mental Well-Being: Balancing Rich Sensing and Patient Needs , 2014, Computer.

[4]  Baisakhi Chakraborty,et al.  Procedure for Cepstral Analysis in tracing unique voice segments , 2015, 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom).

[5]  Alessandro Vinciarelli,et al.  Face-Based Automatic Personality Perception , 2014, ACM Multimedia.

[6]  Faran Awais Butt,et al.  Short-time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals , 2013, 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE).

[7]  Wai Lok Woo,et al.  Wearable Audio Monitoring: Content-Based Processing Methodology and Implementation , 2014, IEEE Transactions on Human-Machine Systems.

[8]  Alex Pentland,et al.  To Signal Is Human , 2010 .

[9]  Maja Pantic,et al.  Social Signal Processing , 2017 .

[10]  A. Pentland,et al.  Thin slices of negotiation: predicting outcomes from conversational dynamics within the first 5 minutes. , 2007, The Journal of applied psychology.

[11]  Parag A. Pathak,et al.  Massachusetts Institute of Technology , 1964, Nature.

[12]  Oliver Watts,et al.  ALISA: An automatic lightly supervised speech segmentation and alignment tool , 2016, Comput. Speech Lang..

[13]  Alessandro Vinciarelli,et al.  More Personality in Personality Computing , 2014, IEEE Transactions on Affective Computing.

[14]  Marcelo S. Alencar,et al.  Voice segmentation system based on energy estimation , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[15]  Alessandro Vinciarelli,et al.  A Survey of Personality Computing , 2014, IEEE Transactions on Affective Computing.

[16]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[17]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[18]  Alex Pentland,et al.  Sensible Organizations: Technology and Methodology for Automatically Measuring Organizational Behavior , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Alessandro Vinciarelli,et al.  Utilising Hidden Markov Modelling for the assessment of accommodation in conversational speech , 2015, ICPhS.

[20]  Alex Pentland,et al.  Socially aware, computation and communication , 2005, Computer.

[21]  Joseph Kee-Yin Ng,et al.  SmartMood: Toward Pervasive Mood Tracking and Analysis for Manic Episode Detection , 2015, IEEE Transactions on Human-Machine Systems.