Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering