Probing Cross-modal Semantics Alignment Capability from the Textual Perspective