Baseline Wsj Acoustic Models for Htk and Sphinx : Training Recipes and Recognition Experiments

For speech recognition research, it is often necessary to start with a competent baseline acoustic model. But training and tuning a competent model using research recognizers such as Cambridge’s HTK and CMU’s Sphinx can be time-consuming. In an effort to minimize wasted effort, I have created recipes for HTK and Sphinx which utilize the standard Wall Street Journal training corpus. In this paper, these recipes are described. The word error rate (WER) and real-time performance of the models are evaluated for differing HMM topologies, number of tied states, number of Gaussians, and differing test sets. My goal is to provide practical advice and results to researchers who are thinking of using HTK or Sphinx for real-time recognition on dictation-like tasks.