Voice activity detection in movies using multi-class deep neural networks

For automatic classification of movie genres, detection of the presence or absence of speech is useful. Hence, accurate voice activity detection (VAD) techniques are needed. However, it is difficult to detect speech segments accurately because there are many kinds of noises in movies. Recently, deep learning has received increased attention in the speech processing field. Some VAD techniques based on deep neural networks (DNNs) have been proposed for clean speech conditions and showed better performance than conventional methods. The aim of this study is to improve VAD performance for movies by using DNNs. Generally, VAD is considered to deal with a two-class classification problem, i.e., classification of speech and non-speech segments. However, diverse noises in movies make it difficult. Therefore, it is difficult to detect speech segments accurately by using two-class DNNs. To solve this problem, we propose the use of multi-class DNNs for VAD in movies. In the experiments, we evaluated the proposed met...