Cross-modal incongruity aligning and collaborating for multi-modal sarcasm detection