DeepSheet: A sheet music generator based on deep learning

Sheet music has long been regarded as one of the most effective medias for musicians, music players, and amateurs to communicate with each other. It is also an intuitive way for non-professionals to learn how to play a musical instrument or sing a song. However, not all composers have willingness to share their own sheet music, especially those protected by strict copyright regulations. For amateurs or novice musicians without the capability to distinguish the chords by their own ears, it would be difficult for them to enjoy the happiness of playing other's music, or singing songs. In order to provide the beginners with an effective way to play music or sing songs without a given sheet music, we developed a sheet music generator - “DeepSheet”, which was engineered based on deep learning techniques. Basically, the DeepSheet system is mainly composed of three distinctive components as follows: 1. voice separation of the audio song, 2. chord estimation of the background music, and 3. alignment of lyrics and music beats. The experimental results drawn from more than 150 songs of the Beatles and Queen indicated that the newly developed DeepSheet system has the capability and sensibility to generate sheet music with accuracy of approximately 76%.

[1]  P. Philippe,et al.  One microphone singing voice separation using source-adapted models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[2]  Yoshua Bengio,et al.  Audio Chord Recognition with Recurrent Neural Networks , 2013, ISMIR.

[3]  Régine André-Obrecht Automatic segmentation of continuous speech signals , 1985 .

[4]  Kiyohiro Shikano,et al.  Two-stage blind source separation based on ICA and binary masking for real-time robot audition system , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  DeLiang Wang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Paris Smaragdis,et al.  Deep learning for monaural speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Douglas Eck,et al.  Aggregate features and ADABOOST for music classification , 2006, Machine Learning.

[8]  Suramya Tomar,et al.  Converting video formats with FFmpeg , 2006 .

[9]  Régine André-Obrecht,et al.  A new statistical approach for the automatic segmentation of continuous speech signals , 1988, IEEE Trans. Acoust. Speech Signal Process..

[10]  Abbas Mohammed,et al.  Blind Source Separation Using Time-Frequency Masking , 2007 .

[11]  Simon Dixon,et al.  Audio Chord Recognition with a Hybrid Recurrent Neural Network , 2015, ISMIR.

[12]  Yi-Hsuan Yang,et al.  On sparse and low-rank matrix decomposition for singing voice separation , 2012, ACM Multimedia.

[13]  Tillman Weyde,et al.  A hybrid recurrent neural network for music transcription , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Simon Dixon,et al.  Improved music feature learning with deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[16]  DeLiang Wang,et al.  Singing Voice Separation from Monaural Recordings , 2006, ISMIR.

[17]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[18]  Jyh-Shing Roger Jang,et al.  On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Quoc V. Le,et al.  Recurrent Neural Networks for Noise Reduction in Robust ASR , 2012, INTERSPEECH.

[20]  Juan Pablo Bello,et al.  Rethinking Automatic Chord Recognition with Convolutional Neural Networks , 2012, 2012 11th International Conference on Machine Learning and Applications.

[21]  Paris Smaragdis,et al.  Singing-voice separation from monaural recordings using robust principal component analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Guillermo Sapiro,et al.  Real-time Online Singing Voice Separation from Monaural Recordings Using Robust Low-rank Modeling , 2012, ISMIR.

[23]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.