Adaptive Reinforcement Learning Frameworks for Dynamic Tax Policy Decision Support
Keywords:
Economic modeling; decision support systems; tax policy; reinforcement learning; simulation Anyone can send a message to this channel.Abstract
Tax policy aims to stabilize the economy and provide basic public services to meet the needs of the domestic economy. However, it is dynamically implemented by governments to regulate economic fluctuations. Tax revenue, affected by many internal and external factors, is difficult for governments to predict. Tax noncompliance and evasion also hinder the effectiveness of tax policy. Consequently, the formulation of tax policy is difficult and needs to be based on a predictive model that can provide reliable decision support. Reinforcement learning (RL) is a branch of machine learning that forms policy through reward-driven interaction with an environment. By setting the control problem of tax policy into an RL framework, reinforcement learning can realize the adaptive and autonomous optimization of tax policy.
Tax policy needs to improve the return on taxation while pursuing other economic goals and maintaining the stability of taxation in order to stimulate compliance and avoid deformation of the tax base. Therefore, it is necessary to ensure that the improvement of taxation return does not affect investment incentives, both domestic and foreign. For public systems, such as social security and education, the rational planning of short-term expenditure and its cycle through a counter-cyclical stance of tax policy in line with economic needs within the overall revenue and expenditure plan.
References
[1] Agarwal, A., Kakade, S., Lee, J. D., & Mahajan, G. (2020). Optimality and approximation guarantees for policy gradient methods in reinforcement learning. Journal of Machine Learning Research, 21(98), 1–76.
[2] Athey, S., Bayati, M., Doudchenko, N., Imbens, G. W., & Khosravi, K. (2021). Matrix completion methods for causal panel data models. Journal of the American Statistical Association, 116(536), 1716–1730.
[3] Athey, S., & Imbens, G. W. (2017). The state of applied econometrics: Causality and policy evaluation. Journal of Economic Perspectives, 31(2), 3–32.
[4] Athey, S., & Wager, S. (2021). Policy learning with observational data. Econometrica, 89(1), 133–161.
[5] Azar, M. G., Osband, I., & Munos, R. (2017). Minimax regret bounds for reinforcement learning. Journal of Machine Learning Research, 18(1), 2633–2680.
[6] Bai, Y., Basu, S., & Sun, X. (2023). Safe reinforcement learning for constrained decision-making: A survey. ACM Computing Surveys, 55(12), 1–37.
[7] Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th International Conference on Machine Learning (pp. 30–37). Morgan Kaufmann.
[8] Bengio, Y., Lecun, Y., & Hinton, G. (2021). Deep learning for AI. Communications of the ACM, 64(7), 58–65.
[9] Bertsekas, D. P. (2019). Reinforcement learning and optimal control. Athena Scientific.
[10] Bertsimas, D., & Kallus, N. (2020). From predictive to prescriptive analytics. Management Science, 66(3), 1025–1044.
[11] Bertsimas, D., & Tsitsiklis, J. N. (1997). Neuro-dynamic programming. Athena Scientific.
[12] Besley, T., & Persson, T. (2014). Why do developing countries tax so little? Journal of Economic Perspectives, 28(4), 99–120.
[13] Bhandari, J., & Russo, D. (2019). A contextually adaptive method for constrained bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1), 3062–3069.
[14] Blanchard, O., & Summers, L. H. (2017). Rethinking stabilization policy. IMF Economic Review, 65(1), 1–35.
[15] Borkar, V. S. (2002). Q-learning for risk-sensitive control. Mathematics of Operations Research, 27(2), 294–311.
[16] Browne, W. J., & Draper, D. (2006). A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Analysis, 1(3), 473–514.
[17] Cai, Q., Yang, Z., Wang, Z., & He, Z. (2019). Exploring under-appreciated challenges in offline reinforcement learning. arXiv preprint arXiv:1909.05833.
[18] Campbell, J. Y. (2018). Financial decisions and markets: A course in asset pricing. Annual Review of Financial Economics, 10, 1–25.
[19] Cartea, Á., Jaimungal, S., & Penalva, J. (2015). Algorithmic and high-frequency trading. Cambridge University Press.
[19] Castro, P. S., Dabney, W., & Rowland, M. (2018). Distributional reinforcement learning: A review. arXiv preprint arXiv:1806.06923.
[20] Chen, L., Hallak, A., & Mannor, S. (2020). Learning a mixture of policies for offline reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence, 34(4), 3365–3372.
[21] Chow, Y., & Ghavamzadeh, M. (2014). Algorithms for CVaR optimization in MDPs. Advances in Neural Information Processing Systems, 27, 3509–3517.
[22] Chow, Y., Tamar, A., Mannor, S., & Pavone, M. (2017). Risk-constrained reinforcement learning with percentile risk criteria. Journal of Machine Learning Research, 18(1), 6070–6120.
[23] Christiano, P., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30, 4299–4307.
[24] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. (2020). Leveraging procedural generation to benchmark reinforcement learning. Proceedings of the 37th International Conference on Machine Learning, 2048–2056.
[25] Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for field settings. Houghton Mifflin.
[26] Corbett-Davies, S., & Goel, S. (2018). The measure and mismeasure of fairness. arXiv preprint arXiv:1808.00023.
[27] D’Amour, A., Heller, K., Moldovan, D., et al. (2020). Underspecification presents challenges for credibility in modern machine learning. Journal of Machine Learning Research, 23(226), 1–61.
[28] Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. Proceedings of the Fifteenth National Conference on Artificial Intelligence, 761–768.
[29] Depeweg, S., Hernández-Lobato, J. M., Doshi-Velez, F., & Udluft, S. (2017). Learning and policy search in stochastic dynamical systems with Bayesian neural networks. Proceedings of the International Conference on Learning Representations.
[30] Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
[31] Duflo, E. (2020). Field experiments and the practice of policy. American Economic Review, 110(7), 1952–1973.
[32] Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). Fairness through awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, 214–226.
[33] Efroni, Y., Mannor, S., & Pirotta, M. (2021). Exploration-exploitation in constrained MDPs. Advances in Neural Information Processing Systems, 34, 21250–21262.
[34] Engstrom, D. F., Ho, D. E., Sharkey, C. M., & Cuéllar, M.-F. (2020). Government by algorithm: Artificial intelligence in federal administrative agencies. Administrative Law Review, 72(4), 1–36.
Additional Files
Published
Data Availability Statement
None