When analyzing the baseline system, we discovered that a word missing from the lattice tends to cause up to 3 word errors, and that such errors are hard to eradicate even with extremely powerful language models. We quanti ed the loss incurred when working from N -best lists as opposed to lattices, and consequently decided to use 1000-best lists. We learned that doubling the amount of language training data is likely to reduce the word error rate by no more than 2%. We experimented with rank histograms as an alternative measure of modeling progress. We found that a mismatch between segmentation information in the training and test data is a signi cant problem, and that it can be overcome to some extent by hypothesizing boundaries; linguistic boundries were found to be more informative than acoustic ones. Segment based error analysis revealed surprisingly that disuent segments are no more error prone than non-dis uent ones, and long segments are no more erroneous than short ones. Finally, word based error analysis yielded a list of features strongly correlated with word error.