论文信息 - A wavelet based method for audio-video synchronization in broadcasting applications

A wavelet based method for audio-video synchronization in broadcasting applications

ABSTRACTThe difference between standards used for films and forvideo generates problems when a conversion from one format toanother is required : Since all the images are displayed, thechange of frame rate induces a pitch change on the sound. Toavoid this problem, the whole soundtrack has to be processedduring the duplication. In this paper, we address thecorresponding sound transformation problem, namely thedilation of the sound spectrum without changing its duration. Forbroadcasting applications, the ratio of transposition is within therange 24/25-25/24. The wide variety of sounds (music, speech,noise…) used in movies led us to first construct a database ofrepresentative sounds containing both transient, noisy and quasi-periodic sounds. This database has been used to compare theperformances of different approaches. The reviewing of the mostwell known methods clearly shows significant disparitiesbetween them according to the class of the signal. This led us toreconsider the problem and to propose methods based on wavelettransforms.1. INTRODUCTIONDue to the coexistence of different standards in film (twentyfour frames per second) and video (twenty five frames persecond), conversions between these formats are often necessary[1]. Generally speaking, since the number of images per unit oftime is different, the duration of the whole sequence varies,according to the type of format used to play the same tape. Thiscauses a dramatic change in sound due to its dilation in bothtime and frequency domains, just like if you play a 45 RPM atthe speed of a 33 RPM.That is why the sound engineers have to compensate thesound modification due to the slowing down of the film thanks toan appropriate frequency stretching.Thus, we have to face a scientific problem of frequencystretching also called pitch shifting. The aim is to preserve thequality of the whole soundtrack according to broadcastingstandards.For our purpose, we shall consider that the frequencystretching ratio is above the unity. In other words, the pitchchange is a transposition towards higher frequencies, but theopposite is a problem of post-production too.Up to now, in post-production studios, this soundtransformation is calculated by a professional machine, Lexicon2400. The transcription can be performed simultaneously only ontwo stereo channels and the technology used is more than fifteenyears old. However, the generalization of the digitalmultichannel sound leads professionals to consider othermachines, using new technologies and based on scientificresearch in digital signal processing in the field of pitch shiftingalgorithms.Our work started by recollecting the available software andsystems and testing them on a sound database. Auditory testslead us to conclude that the Lexicon 2400 does not give perfectresults but remains the most suitable machine for broadcastingapplications.2. BANK OF SOUNDSTo estimate the quality of existing and developed algorithms,we constituted a bank of sounds. Our aim is to preserve thequality of the elements of the whole soundtrack and we havecollected a bank of sounds using the widest variety of sounds.From the cinematographic point of view, we will talk aboutspeech, music, sound effects and surroundings. We haveclassified these sounds from a signal processing point of view.We shall then talk about quasi-periodic, transient, noise andinharmonic signals. Our database has been made from :- Female and male voices, characteristic of quasi-periodicand noisy signals.

Richard Kronland-Martinet | Laurent Daudet | Grégory Pallone | P. Boussard | P. Guillemain

[1] Hugo Fastl,et al. Psychoacoustics: Facts and Models , 1990 .

[2] P. Depalle,et al. Spectral Envelopes and Inverse FFT Synthesis , 1992 .

[3] Xavier Rodet. Musical Sound Signal Analysis/Synthesis: Sinusoidal+Residual and Elementary Waveform Models , 1997 .

[4] M. Portnoff. Time-frequency representation of digital signals and systems based on short-time Fourier analysis , 1980 .

[5] M. Portnoff. Short-time Fourier analysis of sampled speech , 1981 .

[6] M. Portnoff,et al. Time-scale modification of speech based on short-time Fourier analysis , 1981 .

[7] James A. Moorer,et al. The Use of the Phase Vocoder in Computer Music Applications , 1976 .

[8] G. Fairbanks,et al. Method for time of frequency compression-expansion of speech , 1954 .

[9] Jean Laroche. Autocorrelation method for high-quality time/pitch-scaling , 1993, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[10] Mark Dolson,et al. The Phase Vocoder: A Tutorial , 1986 .

[11] Richard Kronland-Martinet,et al. The Wavelet Transform for Analysis, Synthesis, and Processing of Speech and Music Sounds , 1988 .

[12] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.