A HMM-based pre-training approach for sequential data

Much recent research highlighted the critical role of unsuper- vised pre-training to improve the performance of neural network models. However, extensions of those architectures to the temporal domain intro- duce additional issues, which often prevent to obtain good performance in a reasonable time. We propose a novel approach to pre-train sequential neural networks in which a simpler, approximate distribution generated by a linear model is first used to drive the weights in a better region of the parameter space. After this smooth distribution has been learned, the net- work is fine-tuned on the more complex real dataset. The benefits of the proposed method are demonstrated on a prediction task using two datasets of polyphonic music, and the general validity of this strategy is shown by applying it to two different recurrent neural network architectures.