Object Segmentation by Mining Cross-Modal Semantics