Visual image and sound localization with interaction