Real-time concatenative synthesis for networked musical interactions

The recent proliferation of Networked Music Performances has led to the investigation of low-latency, low-bitrate musical encoding schemes, including audio codecs and control protocols that specifically address the requirements of live musical interactions across the Internet. This work presents an alternative perspective inspired by the 'synthesis by analysis' approach, strictly constrained in terms of processing latencies and rendering quality. The entire process is fully automated and involves an offline processing phase (that takes place prior to performance) and a real-time analysis-synthesis phase. The offline phase involves processing a solo recording of each musician's part so as to acquire audio segments corresponding to each note in the performance, and a trained Hidden Markov Model to be later used for online analysis. During live performance, online analysis encodes the position of the performance on a music score and resynthesizes the waveform by concatenating the audio segments of the offline phase. Although the synthesized waveform originates from an offline recording, it is synchronized to the live performance at note level, so as to allow for rendering a wide range of musical tempi as well as their expressive variations. The paper presents the complete methodology and reports on implementation details and preliminary evaluation results.

[1]  Arshia Cont,et al.  Improvement of Observation Modeling for Score Following , 2004 .

[2]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[3]  Miller Puckette Low-dimensional parameter mapping using spectral envelopes , 2004, ICMC.

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  D. Schwarz,et al.  Corpus-Based Concatenative Synthesis , 2007, IEEE Signal Processing Magazine.

[6]  Tunga Güngör,et al.  A CORPUS-BASED CONCATENATIVE SPEECH SYNTHESIS SYSTEM FOR TURKISH , 2006 .

[7]  David Malah,et al.  Quality Preserving Compression of a Concatenative Text-To-Speech Acoustic Database , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Roger B. Dannenberg Concatenative Synthesis Using Score-Aligned Transcriptions , 2006, ICMC.

[9]  Anssi Klapuri,et al.  Sound onset detection by applying psychoacoustic knowledge , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Demosthenes Akoumianakis,et al.  Exploring New Perspectives in Network Music Performance: The DIAMOUSES Framework , 2010, Computer Music Journal.

[11]  Thierry Dutoit,et al.  MBR-PSOLA: Text-To-Speech synthesis based on an MBE re-synthesis of the segments database , 1993, Speech Commun..

[12]  Juan Pablo Bello,et al.  Real-Time Implementation of HMM-Based Chord Estimation in Music Audio , 2009, ICMC.

[13]  François Pachet,et al.  Jamming with Plunderphonics: Interactive concatenative synthesis of music , 2006 .

[14]  M. Davies,et al.  A HYBRID APPROACH TO MUSICAL NOTE ONSET DETECTION , 2002 .

[15]  Diemo Schwarz,et al.  REAL-TIME CORPUS-BASED CONCATENATIVE SYNTHESIS WITH CATART , 2006 .

[16]  Shrikanth S. Narayanan,et al.  Expressive speech synthesis using a concatenative synthesizer , 2002, INTERSPEECH.

[17]  E. Lindemann Music Synthesis with Reconstructive Phrase Modeling , 2007, IEEE Signal Processing Magazine.

[18]  Xavier Serra,et al.  Expressive Concatenative Synthesis by Reusing Samples from Real Performance Recordings , 2009, Computer Music Journal.

[19]  Nicola Orio,et al.  Score Following Using Spectral Analysis and Hidden Markov Models , 2001, ICMC.