Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey