Improving zero-shot action recognition using human instruction with text description