论文信息 - A SYSTEM FOR DATA-DRIVEN CONCATENATIVE SOUND SYNTHESIS

A SYSTEM FOR DATA-DRIVEN CONCATENATIVE SOUND SYNTHESIS

In speech synthesis, concatenative data-driven synthesis methods prevail. They use a database of recorded speech and a unit selection algorithm that selects the segments that match best the utterance to be synthesized. Transferring these ideas to musical sound synthesis allows a new method of high quality sound synthesis. Usual synthesis methods are based on a model of the sound signal. It is very difficult to build a model that would preserve the entire fine details of sound. Concatenative synthesis achieves this by using actual recordings. This data-driven approach (as opposed to a rule-based approach) takes advantage of the information contained in the many sound recordings. For example, very naturally sounding transitions can be synthesized, since unit selection is aware of the context of the database units. The CATERPILLAR software system has been developed to allow data-driven concatenative unit selection sound synthesis. It allows high-quality instrument synthesis with high level control, explorative free synthesis from arbitrary sound databases, or resynthesis of a recording with sounds from the database. It is based on the new software-engineering concept of component-oriented software, increasing flexibility and facilitating reuse.

Diemo Schwarz | Diemo Schwarz

[1] Clemens A. Szyperski,et al. Component software - beyond object-oriented programming , 2002 .

[2] Roger B. Dannenberg,et al. Combining Instrument and Performance Models for High-Quality Music Synthesis , 1998 .

[3] Xavier Rodet,et al. SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum , 1999, ICMC.

[4] Shlomo Dubnov,et al. Hearing beyond the spectrum , 1995 .

[5] Nick Campbell,et al. Optimising selection of units from speech databases for concatenative synthesis , 1995, EUROSPEECH.

[6] Yannis Stylianou. Removing phase mismatches in concatenative speech synthesis , 1998, SSW.

[7] David Wessel,et al. Audio Applications of the Sound Description Interchange Format Standard , 1999 .

[8] John Hajda. A New Model for Segmenting the Envelope of Musical Signals: The Relative Salience of Steady State Versus Attack, Revisited , 1996 .

[9] Paul Taylor,et al. Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[10] Stephen McAdams,et al. Instrument Description in the Context of MPEG-7 , 2000 .

[11] Matthew Wright,et al. Removing the Time Axis from Spectral Model Analysis-Based Additive Synthesis: Neural Networks versus Memory-Based Machine Learning , 1998, ICMC.