Behavior learning method and apparatus in software robot

A method and a device for training behavior of a software robot are provided to train correlation between behavior and an internal status of the software robot, perform training between all behaviors and internal statuses permitted in the software robot, and enable the software robot to recognize all input recognizable to the software robot as reward and penalty even if a user does not give the reward and the penalty in feedback. A behavior performer(50) realizes behavior of a software robot. An episode memory(60) searches an episode corresponding to a type of the realized behavior, and the type of an object and a status recognized in virtual space among a plurality of episodes storing variance related to each status, and calculates a representative variance by using the variance stored in response to the searched episode and the variance generated in response to the realized behavior. The episode memory stores the calculated representative variance as the variance of the searched episode. A perception unit(20) manages the results that a software robot perceives the environment information of virtual space and the physical state of a body.