Region Collaborative Network for Detection-Based Vision-Language Understanding