Using deep Q-learning to understand the tax evasion behavior of risk-averse firms

Designing tax policies that are effective in curbing tax evasion and maximize state revenues requires a rigorous understanding of taxpayer behavior. This work explores the problem of determining the strategy a self-interested, risk-averse tax entity is expected to follow, as it "navigates" - in the context of a Markov Decision Process - a government-controlled tax environment that includes random audits, penalties and occasional tax amnesties. Although simplified versions of this problem have been previously explored, the mere assumption of risk-aversion (as opposed to risk-neutrality) raises the complexity of finding the optimal policy well beyond the reach of analytical techniques. Here, we obtain approximate solutions via a combination of Q-learning and recent advances in Deep Reinforcement Learning. By doing so, we i) determine the tax evasion behavior expected of the taxpayer entity, ii) calculate the degree of risk aversion of the "average" entity given empirical estimates of tax evasion, and iii) evaluate sample tax policies, in terms of expected revenues. Our model can be useful as a testbed for "in-vitro" testing of tax policies, while our results lead to various policy recommendations.

[1]  Michael Pickhardt,et al.  Income Tax Evasion in a Society of Heterogeneous Agents – Evidence from an Agent-based Model , 2010 .

[2]  Jonathan Rosenhead,et al.  The role of operational research in less developed countries: A critical approach , 1990 .

[3]  H. Winner,et al.  The Occurrence of Tax Amnesties: Theory and Evidence , 2014 .

[4]  Adair Morse,et al.  Measuring Income Tax Evasion using Bank Credit: Evidence from Greece∗ , 2015 .

[5]  Satinder P. Singh,et al.  Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[6]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[7]  Raphael N. Markellos,et al.  Sovereign Debt Markets in Light of the Shadow Economy , 2012, Eur. J. Oper. Res..

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  J. Alm,et al.  Tax Amnesties and Tax Revenues , 1990 .

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Regina Barzilay,et al.  Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[13]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[14]  F. Cowell Taxation and labour supply with risky activities , 1981 .

[15]  Matthew H. Fleming,et al.  The shadow economy , 2000 .

[16]  J. Ross,et al.  Strategic Tax Planning for State Tax Amnesties , 2013 .

[17]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[18]  Luigi Mittone,et al.  Tax evasion behavior using finite automata: Experiments in Chile and Italy , 2012, Expert Syst. Appl..

[19]  L. Franzoni Tax Compliance , 2008 .

[20]  Abhijit Gosavi,et al.  Reinforcement learning for long-run average cost , 2004, Eur. J. Oper. Res..

[21]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[22]  S. White Learning to Communicate , 1999, Architectural Research Quarterly.

[23]  Prasad Tadepalli,et al.  Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.

[24]  Shijia Gao,et al.  Conceptual modeling and development of an intelligent agent-assisted decision support system for anti-money laundering , 2009, Expert Syst. Appl..

[25]  David Silver,et al.  Learning values across many orders of magnitude , 2016, NIPS.

[26]  J. Alm,et al.  Tax Policy Analysis: The Introduction of a Russian Tax Amnesty , 1998 .

[27]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[28]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[29]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[32]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[33]  Nando de Freitas,et al.  Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[34]  Dimitrios Hristu-Varsakelis,et al.  A decision support model for tax revenue collection in Greece , 2012, Decis. Support Syst..

[35]  Michael Pickhardt,et al.  Income tax evasion dynamics: Evidence from an agent-based econophysics model , 2011 .

[36]  Charles T. Clotfelter,et al.  Tax Evasion and Tax Rates: An Analysis of Individual Returns , 1983 .

[37]  Brendan J. Frey,et al.  Deep learning of the tissue-regulated splicing code , 2014, Bioinform..

[38]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[39]  K. Narendra,et al.  Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.

[40]  Nicolas Huck,et al.  Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500 , 2017, Eur. J. Oper. Res..

[41]  M. Rider,et al.  Multiple Modes of Tax Evasion: Theory and Evidence , 2003, National Tax Journal.

[42]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[43]  Sung-Bae Cho,et al.  Human activity recognition with smartphone sensors using deep learning neural networks , 2016, Expert Syst. Appl..

[44]  J. Ross,et al.  Strategic Tax Planning for State Tax Amnesties: Evidence from Eligibility Period Restrictions , 2012 .

[45]  Dilip Mookherjee,et al.  Tax Amnesties in India; An Empirical Evaluation , 1995 .

[46]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[47]  Agnar Sandmo,et al.  Income tax evasion: a theoretical analysis , 1972 .

[48]  Jonathan C. Baldry,et al.  Tax evasion and labour supply , 1979 .

[49]  Farrokh Nourzad,et al.  Inflation and Tax Evasion: An Empirical Analysis , 1986 .

[50]  Adair Morse,et al.  Tax Evasion Across Industries: Soft Credit Evidence from Greece , 2015 .