Comparison of multilayer and radial basis function neural networks for text-dependent speaker recognition

This paper compares the use of multilayer perceptrons (MLPs) trained on backpropagation and radial basis function (RBF) neural networks for the task of text-dependent speaker recognition. 10 classifier networks were generated for each of 20 male-speakers using randomly-generated training sets consisting of 6 true speaker utterances and 19 false speaker utterances (one from each of the false speakers). The resulting networks were then used to assess verification and identification performance for each of the network architectures. The results clearly indicate that the choice of true and false speaker utterances used in the training set has a crucial effect on the success of the classifier. The overall superiority of performance reported in general for RBF networks over MLPs would appear to be due to the reduced sensitivity of the former to a poor training set when compared to the performance of an MLP for the same training set. When both networks are presented with their "best" training sets, however, the RBF network still significantly out-performs the MCP.