Evaluating Surprise Adequacy for Deep Learning System Testing