To evaluate perseverative or compulsive properties of genetic mice models of various neurological and psychiatric disorders, a reversal task has been used. However, because the conventional reversal tasks are limited to two options, it is difficult to distinguish between impairments in explorative propensity and impairments in the extinction of learned behaviors. Therefore, we developed a five-choice exploratory operant task the five-arm bandit task (5-ABT) to assess behavioral flexibility. The task consists of Phase 1 in which five options are equally rewarded with a constant probability, Phase 2 in which only one option is rewarded, and Phase 3 in which the reward pattern of Phase 2 is reversed. Using this task, we analyzed the behavior of Big Potassium Channel Knockout (BK KO) mice. The choice entropies in Phase 3 between BK KO mice and its wild type littermates were significantly different, while the entropies in Phase 1 were not different. This suggests that BK KO mice are impaired with respect to extinction of learned options or learning from mistakes rather than exploration. To quantify the ability to learn from rewarded and unrewarded experiences, we also estimated the positive and negative learning rate parameters by fitting behavioral data to a Q-learning model with differential learning rate. The negative learning rates of BK KO mice was significantly reduced compared to the wild type, indicating that their ability to learn from mistakes was impaired. This method enables the examination of candidate murine models for various neurological and psychiatric disorders, including autism and addiction, and the subsequent evaluation of pharmacological effects. Oral Sessions