LONG-TERM TEMPORAL MODELING FOR VIDEO ACTION UNDERSTANDING