The MSR-Video to Text dataset with clean annotations