Cooperative Multi-Agent Reinforcement Learning Framework for Scalping Trading

We explore deep Reinforcement Learning(RL) algorithms for scalping trading and knew that there is no appropriate trading gym and agent examples. Thus we propose gym and agent like Open AI gym in finance. Not only that, we introduce new RL framework based on our hybrid algorithm which leverages between supervised learning and RL algorithm and uses meaningful observations such order book and settlement data from experience watching scalpers trading. That is very crucial information for traders behavior to be decided. To feed these data into our model, we use spatio-temporal convolution layer, called Conv3D for order book data and temporal CNN, called Conv1D for settlement data. Those are preprocessed by episode filter we developed. Agent consists of four sub agents divided to clarify their own goal to make best decision. Also, we adopted value and policy based algorithm to our framework. With these features, we could make agent mimic scalpers as much as possible. In many fields, RL algorithm has already begun to transcend human capabilities in many domains. This approach could be a starting point to beat human in the financial stock market, too and be a good reference for anyone who wants to design RL algorithm in real world domain. Finally, weexperiment our framework and gave you experiment progress.

[1]  Hang Li,et al.  The implementation of reinforcement learning algorithms on the elevator control system , 2015, 2015 IEEE 20th Conference on Emerging Technologies & Factory Automation (ETFA).

[2]  Cordelia Schmid,et al.  Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[4]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[5]  Ananth N. Madhavan,et al.  Market Microstructure: A Survey , 2000 .

[6]  Zhengyao Jiang,et al.  Cryptocurrency portfolio management with deep reinforcement learning , 2016, 2017 Intelligent Systems Conference (IntelliSys).

[7]  Nicholas Sutardja,et al.  Machine learning techniques for price change forecast using the limit order book data , 2015 .

[8]  Jonghun Park,et al.  A Multiagent Approach to $Q$-Learning for Daily Stock Trading , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[9]  Johannes A. Skjeltorp,et al.  Is the Market Microstructure of Stock Markets Important , 2006 .