Reinforcement learning is an algorithm without model which is learning what to do--how to map situations to actions--so as to maximize a numerical reward signal. Reinforcement learning provides an available method to the systems, which are very difficult to build up accurate models around complex environment. But now many practical problems demand a maximum reward with not much cost (expense). For example, the production of coal mine is closely correlated with security in that it increases production in the limited range of security situation. On the base of Markov decision process (MDP) and reinforcement learning, the paper introduced constraint Markov decision process into reinforcement learning. The paper improved Q-learning algorithm with adding cost factor and gave a new Q-learning algorithm based on constraint MDP. Finally, according to the constraint between production and safety in coal mine, the paper made the simulation investigation about the action control of coal shearer in coal mine working face. The simulation result had verified the validity of the method.