Robust Multi-Step Bootstrapping

Journal of Advanced Technology Research, Vol. 5, No. 1, pp. 6-11, Jun. 2020
10.11111/JATR.2020.5.1.006, Full Text:
Keywords: Reinforcement Learning, Monte Carlo method, TD learning, one-step TD method, n-step TD method, Ω-return, Q-learning
Abstract

The n-step temporal-difference(TD) learning is a unification of the Monte Carlo method and the one-step TD method. Monte Carlo method utilizes the complete return, one-step TD method utilizes the one-step return as the target value of learning, and n-step TD learning utilizes n-step return. However, the optimal value of n depends on the learning environment and other hyper-parameters, so it is difficult to find the optimal value of n. In this paper, we propose a new target value, called Ω-return, and apply Ω-return to Q-learning to evaluate the performance on the robustness of the proposed method through experiments in various environments.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
G. Hwang, J. Kim, Y. Han, "Robust Multi-Step Bootstrapping," Journal of Advanced Technology Research, vol. 5, no. 1, pp. 6-11, 2020. DOI: 10.11111/JATR.2020.5.1.006.

[ACM Style]
Gyu-Young Hwang, Ju-Bong Kim, and Youn-Hee Han. 2020. Robust Multi-Step Bootstrapping. Journal of Advanced Technology Research, 5, 1, (2020), 6-11. DOI: 10.11111/JATR.2020.5.1.006.