Sound Localization by Self-Supervised Time Delay Estimation