Bayesian nonparametric music parser

This paper proposes a novel representation of music that can be used for similarity-based music information retrieval, and also presents a method that converts an input polyphonic audio signal to the proposed representation. The representation involves a 2-dimensional tree structure, where each node encodes the musical note and the dimensions correspond to the time and simultaneous multiple notes, respectively. Since the temporal structure and the synchrony of simultaneous events are both essential in music, our representation reflects them explicitly. In the conventional approaches to music representation from audio, note extraction is usually performed prior to structure analysis, but accurate note extraction has been a difficult task. In the proposed method, note extraction and structure estimation is performed simultaneously and thus the optimal solution is obtained with a unified inference procedure. That is, we propose an extended 2-dimensional infinite probabilistic context-free grammar and a sparse factor model for spectrogram analysis. An efficient inference algorithm, based on Markov chain Monte Carlo sampling and dynamic programming, is presented. The experimental results show the effectiveness of the proposed approach.