String and lattice based discriminative training for the corpus of spontaneous Japanese lecture transcription task

Abstract This article aims to provide a comprehensive set of acousticmodel discriminative training results for the Corpus of Spon-taneous Japanese (CSJ) lecture speech transcription task. Dis-criminativetrainingwascarriedoutforthistaskusinga100,000word trigram for several acoustic model topologies, using bothdiagonal and full covariance models, and using both string-based and lattice-based training paradigms. We describe ourimplementation of the proposal by Macherey et al. for numer-ical subtraction of the reference lattice statistics from the com-petitor lattice statistics during lattice-based Minimum Classifi-cation Error (MCE) training. We also present results for lattice-based training that does not use such subtraction, correspond-ing to the well-known Maximum Mutual Information (MMI)approach. Discriminative training yielded relative reductionsin Word Error Rate of up to 13%. Specific problems encoun-tered in implementing discriminative training for this task arediscussed. 1. Introduction