Improvement of multimodal gesture and speech recognition performance using time intervals between gestures and accompanying speech