MidiMe: Personalizing a MusicVAE model with user data

Training a custom deep neural network model like Music Transformer [3], MusicVAE [4] or SketchRNN [2] from scratch requires significant amounts of data (millions of examples) and compute resources (specialized hardware like GPUs/TPUs) as well as expertise in hyperparameter tuning. Without sufficient data, models are either unable to produce realistic output (underfitting), or they memorize the training examples and are unable to generalize to produce varied outputs (overfitting) – it would be like trying to learn all of music theory from a single song.