Model and Architecture of Hierarchical Reinforcement Learning Based on Agent

By introducing frequency maximum Q heuristic learning algorithm,a hierarchical reinforcement learning method is improved,this method solves the problem of agent optimal strategy learning in a large scale state space and dynamic changing environment. Bringing attribute maintenance operator,the attribute of promise and layout into the classical belief,desire,intention(BDI) model,which is modified to increase the adaptability and in-line learning ability of agent,the rational maintenance process of consciousness attribute is given. A new agent system and architecture with initiative,autonomy,adaptability and sociality is proposed,and a new path planning agent (APP) is developed on the basis of this architecture. Through setting the configuration of drive environment,the complicated vehicle drive state is simulated,and through continuous learning of the drive state,the optimal path is obtained finally,and then the feasibility and effectiveness of the new architecture are verified.