Sound Source Localization is All about Cross-Modal Alignment