Data-Driven Recomposition using the Hierarchical Dirichlet Process Hidden Markov Model

Hidden Markov Models (HMMs) have been widely used in various audio analysis tasks such as speech recognition and genre classification. In this paper we show how HMMs can be used to synthesize new audio clips of unlimited length inspired by the temporal structure and perceptual content of a training recording or set of such recordings. We use Markov chain techniques similar to those that have long been used to generate symbolic data such as text and musical scores to instead generate sequences of continuous audio feature data that can then be transformed into audio using feature-based and concatenative synthesis techniques. Additionally, we explore the use of the Hierarchical Dirichlet Process HMM (HDP-HMM) for music, which sidesteps some difficulties with traditional HMMs, and extend the HDP-HMM to allow multiple song models to be trained simultaneously in a way that allows the blending of different models to produce output that is a hybrid of multiple input recordings.

[1]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2]  Charles Ames,et al.  The Markov Process as a Compositional Model: A Survey and Tutorial , 2017 .

[3]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[4]  Chin-Hui Lee,et al.  MAP Estimation of Continuous Density HMM : Theory and Applications , 1992, HLT.

[5]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[6]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[7]  Curtis Roads,et al.  The Computer Music Tutorial , 1996 .

[8]  Michael I. Jordan Graphical Models , 2003 .

[9]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Diemo Schwarz,et al.  Current Research in concatenative sound synthesis , 2005, ICMC.

[12]  W. Boscardin,et al.  Modeling the Covariance and Correlation Matrix of Repeated Measures , 2005 .

[13]  Michael A. Casey,et al.  Acoustic lexemes for organizing internet audio , 2005 .

[14]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[15]  Perry R. Cook,et al.  Feature-Based Synthesis: Mapping Acoustic and Perceptual Features onto Synthesis Parameters , 2006, ICMC.

[16]  Michael I. Jordan,et al.  Developing a tempered HDP-HMM for Systems with State Persistence , 2007 .

[17]  Perry R. Cook,et al.  The Featsynth Framework for Feature-Based synthesis: Design and Applications , 2007, ICMC.

[18]  Shlomo Dubnov,et al.  Audio Oracle: a New Algorithm for Fast Learning of audio Structures , 2007, ICMC.