Automatic speech discrete labels to dimensional emotional values conversion method
暂无分享,去创建一个
Dimensional emotion estimation (e.g. arousal and valence) from spontaneous and realistic expressions has drawn increasing commercial attention. However, the application of dimensional emotion estimation technology remains a challenge due to issues such as manual annotation and evaluation. In this work, the authors introduce an automatic annotation and emotion prediction model. The automatic annotation is performed through three main steps: (i) label initialisation, (ii) automatic label annotation, and (iii) label optimisation. The approach has been validated on different language databases with different types of emotion expressions, including spontaneous, acted and induced emotional expressions. Compared with non-optimisation of the predicted labels, the process of optimisation improves the concordance correlation coefficient (CCC) values by an average of 0.104 for arousal and 0.051 for valence. Furthermore, the standard variation between annotated values and the ground truth is reduced to an average of 0.44 for arousal and 0.34 for valence. Finally, the CCC values using the proposed model reach 0.58 for arousal and 0.28 for valence, which further verifies the feasibility and reliability of the proposed model. The proposed method can be used to reduce labour intensive and time-consuming manual annotation work.