Method and apparatus for voice recognition using video recognition

The present invention relates to a voice recognition method and a voice recognition device which can more accurately recognize the start and end of voice recognition based on video recognition. The voice recognition method according to an embodiment of the present invention includes: a step of determining the start of utterance based on at least one between first video data or first audio data prior to voice recognition mode conversion; a step of converting into a voice recognition mode when the event is determined as the start of utterance, and generating second audio data which includes a voice command of a user; and a step of determining the end of utterance based on at least one between second video data or the second audio data after converting into the voice recognition mode. According to the present invention, a user can execute a voice recognition function only by moving lips without having to input a separate gesture, and the start and end of voice recognition can be more clearly determined through video recognition. [Reference numerals] (310) Confirming the start of utterance; (320) Displaying a voice command; (330) Recognizing the voice; (340) Confirming the end of utterance; (350) Displaying the result the of voice recognition; (360) Executing a function; (AA) Main computer