Visual Compositional Learning for Human-Object Interaction Detection