Transformer networks with adaptive inference for scene graph generation