A SYSTEM FOR DATA-DRIVEN CONCATENATIVE SOUND SYNTHESIS

In speech synthesis, concatenative data-driven synthesis methods prevail. They use a database of recorded speech and a unit selection algorithm that selects the segments that match best the utterance to be synthesized. Transferring these ideas to musical sound synthesis allows a new method of high quality sound synthesis. Usual synthesis methods are based on a model of the sound signal. It is very difficult to build a model that would preserve the entire fine details of sound. Concatenative synthesis achieves this by using actual recordings. This data-driven approach (as opposed to a rule-based approach) takes advantage of the information contained in the many sound recordings. For example, very naturally sounding transitions can be synthesized, since unit selection is aware of the context of the database units. The CATERPILLAR software system has been developed to allow data-driven concatenative unit selection sound synthesis. It allows high-quality instrument synthesis with high level control, explorative free synthesis from arbitrary sound databases, or resynthesis of a recording with sounds from the database. It is based on the new software-engineering concept of component-oriented software, increasing flexibility and facilitating reuse.

[1]  Clemens A. Szyperski,et al.  Component software - beyond object-oriented programming , 2002 .

[2]  Roger B. Dannenberg,et al.  Combining Instrument and Performance Models for High-Quality Music Synthesis , 1998 .

[3]  Xavier Rodet,et al.  SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum , 1999, ICMC.

[4]  Shlomo Dubnov,et al.  Hearing beyond the spectrum , 1995 .

[5]  Nick Campbell,et al.  Optimising selection of units from speech databases for concatenative synthesis , 1995, EUROSPEECH.

[6]  Yannis Stylianou Removing phase mismatches in concatenative speech synthesis , 1998, SSW.

[7]  David Wessel,et al.  Audio Applications of the Sound Description Interchange Format Standard , 1999 .

[8]  John Hajda A New Model for Segmenting the Envelope of Musical Signals: The Relative Salience of Steady State Versus Attack, Revisited , 1996 .

[9]  Paul Taylor,et al.  Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[10]  Stephen McAdams,et al.  Instrument Description in the Context of MPEG-7 , 2000 .

[11]  Matthew Wright,et al.  Removing the Time Axis from Spectral Model Analysis-Based Additive Synthesis: Neural Networks versus Memory-Based Machine Learning , 1998, ICMC.

[12]  Michael W. Macon,et al.  Generalization and discrimination in tree-structured unit selection , 1998, SSW.

[13]  H. Hermansky Data‐driven speech analysis for ASR , 1999 .

[14]  Yannis Stylianou Concatenative speech synthesis using a harmonic plus noise model , 1998, SSW.

[15]  Xavier Rodet,et al.  Statistical Modeling of Sound Aperiodicities , 1997, ICMC.

[16]  Michael W. Macon,et al.  Optimized stopping criteria for tree-based unit selection in concatenative synthesis , 1998, ICSLP.

[17]  Mark A. Clements,et al.  Concatenation-Based MIDI-to-Singing Voice Synthesis , 1997 .

[18]  Stéphane Rossignol,et al.  Segmentation et indexation des signaux sonores musicaux , 2000 .

[19]  Carola Boehm,et al.  MuTaTeD'II: A System for Music Information Retrieval of Encoded Music , 2000, ICMC.

[20]  Xavier Rodet,et al.  Spectral Envelope Estimation and Representation for Sound Analysis-Synthesis , 1999, ICMC.

[21]  Marc C. Beutnagel,et al.  The AT & T NEXT-GEN TTS system , 1999 .

[22]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.