Speech coding using mixture of gaussians polynomial model

SPEECHCODINGUSINGMIXTUREOFGAUSSIANSPOLYNOMIALMODELParham ZolfaghariyTony RobinsonCREST/ATR Human Information Pro cessing Research Labs, Kyoto 619-02, Japanemail :zparham@hip.atr.co.jpyCambridge University Engineering Department,Cambridge CB2 1PZ, UKemail :ajr@eng.cam.ac.ukABSTRACTWehaveinestigated a noel metho d of sp ectral estimationbased on mixture of Gaussians in a sinusoidal analysis andsynthesisframework.Afterquantisationofthisparamet-ric scheme a xed frame-rate co der op erating at a bit-rate ofaround 2.4 kbits/s has b een develop ed. This pap er describ esanextensiontothissp ectralmo delbasedonconstrainingthe parameters of the mixture of Gaussians to b e on a p oly-nomialtra jectoryoverasegmentofsp eechdata.ThisisreferredtoasthemixtureofGaussiansp olynomialmo del(MGPM).Inordertorealiseasegmentalco der,dynamicprogramming over the utterance is p erformed.The segmen-talrepresentationofthe sp ectraresultsinalog-likeliho o dscoreover a segment which is used as the cost function inthe dynamic programming algorithm.Sp eech co ding com-p onents suchaspitch,voicingand gainaredescrib edseg-mentally.Anumb er of segmental co ders are presented withbit-rates in the range of 350 to 650 bits/s.These co ders of-fer go o d and intelligible co ded sp eechevaluated using DRTscoring at these bit-rates.1.INTRODUCTIONA segmental framework employs the inter-frame or time de-p endenceofthesp ectralrepresentation.Thisdep endenceis inherentinvarious segments of sp eech, such as sustainedvowels, as the sp eech sp ectral enelop e is a slow time-varyingpro cess and sp ectra of adjacent frames are highly correlated.Variousformsofsegmentationmo delshaveb eenappliedtosp eechco dingandrecognition.Inco d-ingRoucosetal[11 ]describ eaverylowbit-ratesegmentvocoderop eratingat 150bits/sfora singlesp eaker.Thislow rate is achieved byvector quantisation (VQ) of all theLPC sp ectra in a segment as a single unit. The Kang-Coulter600bits/svocoder[6 ]alsousesLPCmetho ds followedbyformant tracking to pro duce go o d quality sp eech with a re-p orted DRT score of 79.9.These low bit-rates can also b eachieved by a recognition-based approach where recognitionunits are co ded.Holmes [5 ] has describ ed a metho d whichuses an underlying linear-tra jectory formant mo del for b othrecognition and synthesis.Thecontributionofthisworkistomo deltheenvelop eoftheshort-termpowersp ectraldensityasamixtureGaussians [13 ].In this framework a Gaussian roughly corre-sp onds to a formant with the Gaussian mean corresp ondingto the formant frequency and the variance corresp onding tothe bandwidth.Thismo del wasintegrated in a sinusoidalmo delbasedsp eechco dingscheme[14 ].Anadvantageofthis frameworkis that a sp eech segmentmay b e mo delledusing a p olynomial tra jectoryfor the Gaussianmeans andvariances.Wehae previously rep orted on a segmental co derusing a linear p olynomial tra jectory for the Gaussian mix-tures op erating b etween 600-800 bits/s [15 ].We extend thismo del to an R'th order p olynomial to represent b oth meansandvariancesofthe Gaussians.In the sp eechrecognitionarea, similar mo dels have also b een implemented for MFCCtra jectories in a HMM-based system [4].2.SEGMENTAL CODER STRUCTUREThe blo ck structure of the co ders describ ed in this pap er isas shown in Figure 1.A sinusoidal mo del framework basedonthe ideasofMcAulayandQuatieri[8 ]isused.Inthismo del, the sp eech signal is represented by a harmonic set ofpartials with varying amplitudes and frequencies.In accor-dance with our desire to build a very low bit-rate co der werestrict the sine waves to b e harmonically related.The in-verse FFT metho d of re-synthesis [3 ] is used and the phase ofeach harmonic is chosen at reconstruction time to minimisethe mismatch with the previous frame.TheSp ectralEnvelop eEstimationVocoder(SEEVOC)envelop e,devised byPaul [9] uses a robust p eak detectionalgorithm to yield a smo oth envelop e as the underlying sp ec-tral representation.In order to op erate in the low bit-rateregion, the SEEVOC envelop e needs to b e eciently co ded.We aid the mixture of Gaussians p olynomial mo del to rep-resent this sp ectra over a segment.Polynomial least squares