A Closer Look at Weakly-Supervised Audio-Visual Source Localization