The optimisation of speech recognition based on convolutional neural network

The convolutional neural network (CNN) as acoustic model is introduced into speech recognition system based on mobile computing. To improve speech accuracy, two optimised methods are proposed in speech recognition based on CNN. Firstly, aiming at the problem for existing pooling algorithms ignoring locally relevant characteristics of the speech data, a dynamic adaptive pooling (DA-pooling) algorithm is proposed in pooling layer of CNN model. DA-pooling algorithm calculates the Spearman correlation coefficient of the extracted data to determine data correlation, then selects appropriate pool strategy for different correlativity of data according to weight. Secondly, in order to solve traditional dropout hiding neurons node randomly, a dropout strategy based on sparseness is proposed in full-connected layer in CNN model. By adding a unit sparseness determination mechanism in the output stage of network unit, we can reduce the ratio of influence of smaller units in the model results, thereby improving the generalisation ability of the model. Experimental results show that these strategies can improve the performance of the acoustic models based on CNN.