论文信息 - Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

We investigate the problem of modeling symbolic sequences of polyphonic music in a completely general piano-roll representation. We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences. Our approach outperforms many traditional models of polyphonic music on a variety of realistic datasets. We show how our musical language model can serve as a symbolic prior to improve the accuracy of polyphonic transcription.

[1] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[2] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .

[3] Michael C. Mozer,et al. Neural Network Music Composition by Prediction: Exploring the Benefits of Psychoacoustic Constraints and Multi-scale Processing , 1994, Connect. Sci..

[4] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[5] Ali Taylan Cemgil,et al. Bayesian Music Transcription , 1997 .

[6] Samy Bengio,et al. Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks , 1999, NIPS.

[7] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[8] Jürgen Schmidhuber,et al. Finding temporal structure in music: blues improvisation with LSTM recurrent networks , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[9] Jeremy Pickens,et al. Polyphonic music modeling with random fields , 2003, MULTIMEDIA '03.

[10] Christopher K. I. Williams,et al. Harmonising Chorales by Probabilistic Inference , 2004, NIPS.

[11] Geoffrey E. Hinton,et al. Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[12] Jovan Popović,et al. Style translation for human motion , 2005, ACM Trans. Graph..

[13] Geoffrey E. Hinton,et al. Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[14] Daniel P. W. Ellis,et al. A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[15] Geoffrey E. Hinton,et al. Learning Multilevel Distributed Representations for High-Dimensional Sequences , 2007, AISTATS.

[16] DeLiang Wang,et al. Pitch Detection in Polyphonic Music using Instrument Tone Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[18] Geoffrey E. Hinton,et al. The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[19] Ruslan Salakhutdinov,et al. On the quantitative analysis of deep belief networks , 2008, ICML '08.

[20] Benjamin Schrauwen,et al. A hierarchy of recurrent networks for speech recognition , 2009, NIPS 2009.

[21] Jean-François Paiement,et al. Probabilistic models for melodic prediction , 2009, Artif. Intell..

[22] Mert Bay,et al. Evaluation of Multiple-F0 Estimation and Tracking Systems , 2009, ISMIR.

[23] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[24] Hugo Larochelle,et al. The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[25] Juhan Nam,et al. A Classification-Based Polyphonic Piano Transcription Approach Using Learned Feature Representations , 2011, ISMIR.

[26] Geoffrey E. Hinton,et al. Conditional Restricted Boltzmann Machines for Structured Output Prediction , 2011, UAI.

[27] Silvio Savarese,et al. Structured Recurrent Temporal Restricted Boltzmann Machines , 2014, ICML.