How Far Are We from Believable AI Agents? A Framework for Evaluating the Believability of Human Behavior Simulation