Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation