Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes