Learning Sound Localization Better from Semantically Similar Samples