We understand the environment by integrating information obtained by the senses of sight, hearing and touch. To integrate information across different senses, we must find the correspondence of events observed by different senses. This paper presents a general method for relating the audio-visual events of more than one movement (repetitive and non-repetitive movement) observed by one camera and one microphone. The method uses general laws without object-specific knowledge. As corresponding cues, we use Gestalt's grouping laws : simultaneity of the occurrence of the sound and the change in movement, and similarity of repetition between sound and movement. We conducted experiments in the real environment, and obtained satisfactory results showing the effectiveness of the proposed method.