Enhancing support vector machine-based speech/music classification using conditional maximum a posteriori criterion

Support vector machines (SVMs) have been recognised as a promising technique in the field of pattern recognition, and one of the interesting applications of this technique is speech/music classification. In this study, the authors propose a novel approach to improve the SVM-based speech/music classification using the second-order conditional maximum a posteriori (CMAP). To do this, the authors first devise a method to estimate a posteriori probability to select between speech and music from the SVM output. This is achieved by employing the sigmoid function, obtained by optimised data training. A final speech/music classification is then acquired using the second-order CMAP with a maximum a posteriori probability depending not only on the current observation, but also on the classification results of two previous frames, incorporating substantial inter-frame correlations. While conventional SVM optimisation techniques are used during the training phase, the proposed technique can be inherently adopted in the classification phase. In this regard, the proposed approach can be developed and employed in parallel with other optimisation techniques. Experimental results show that the proposed algorithm yields better results than the speech/music classification rule in SVM.