Knowledge Distillation Using Output Errors for Self-Attention ASR Models