XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music

With the development of knowledge of music composition and the recent increase in demand, an increasing number of companies and research institutes have begun to study the automatic generation of music. However, previous models have limitations when applying to song generation, which requires both the melody and arrangement. Besides, many critical factors related to the quality of a song such as chord progression and rhythm patterns are not well addressed. In particular, the problem of how to ensure the harmony of multi-track music is still underexplored. To this end, we present a focused study on pop music generation, in which we take both chord and rhythm influence of melody generation and the harmony of music arrangement into consideration. We propose an end-to-end melody and arrangement generation framework, called XiaoIce Band, which generates a melody track with several accompany tracks played by several types of instruments. Specifically, we devise a Chord based Rhythm and Melody Cross-Generation Model (CRMCG) to generate melody with chord progressions. Then, we propose a Multi-Instrument Co-Arrangement Model (MICA) using multi-task learning for multi-track music arrangement. Finally, we conduct extensive experiments on a real-world dataset, where the results demonstrate the effectiveness of XiaoIce Band.

[1]  Olof Mogren,et al.  C-RNN-GAN: Continuous recurrent neural networks with adversarial training , 2016, ArXiv.

[2]  Larry P. Heck,et al.  A Unit Selection Methodology for Music Generation Using Deep Neural Networks , 2016, ICCC.

[3]  Parag Chordia,et al.  Predictive Tabla Modelling Using Variable-length Markov and Hidden Markov Models , 2011 .

[4]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[5]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[6]  Joachim Bingel,et al.  Sluice networks: Learning what to share between loosely related tasks , 2017, ArXiv.

[7]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[8]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Sanja Fidler,et al.  Song From PI: A Musically Plausible Network for Pop Music Generation , 2016, ICLR.

[10]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[11]  Jianmin Wang,et al.  Learning Multiple Tasks with Deep Relationship Networks , 2015, ArXiv.

[12]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Eduardo Coutinho,et al.  Deep recurrent music writer: Memory-enhanced variational autoencoder-based musical score composition and an objective measure , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[14]  François Pachet,et al.  Markov constraints: steerable generation of Markov sequences , 2010, Constraints.

[15]  Ana Paiva,et al.  MAgentA: An Architecture for Real Time Automatic Composition of Background Music , 2001, IVA.

[16]  Feng Zhou,et al.  Embedding Label Structures for Fine-Grained Feature Representation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[18]  Peter Grosche,et al.  Analyzing Chroma Feature Types for Automated Chord Recognition , 2011, Semantic Audio.

[19]  François Pachet,et al.  Sampling Variations of Sequences for Structured Music Generation , 2017, ISMIR.

[20]  Darrell Conklin,et al.  Music Generation from Statistical Models , 2003 .

[21]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[22]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[23]  C. Harte,et al.  Detecting harmonic change in musical audio , 2006, AMCMM '06.

[24]  Brink van der Merwe,et al.  Music Generation with Markov Models , 2011, IEEE MultiMedia.

[25]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[26]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[27]  Harry Shum,et al.  From Eliza to XiaoIce: challenges and opportunities with social chatbots , 2018, Frontiers of Information Technology & Electronic Engineering.

[28]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[29]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[30]  Frank Nielsen,et al.  DeepBach: a Steerable Model for Bach Chorales Generation , 2016, ICML.

[31]  Yi-Hsuan Yang,et al.  MidiNet: A Convolutional Generative Adversarial Network for Symbolic-Domain Music Generation , 2017, ISMIR.

[32]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).