Detection of visual dialog scenes in video content based on structural and semantic features