亮点文章

当前位置 >>亮点文章 >>亮点文章

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function时间: 2019-06-12 点击: 24 次

We present an algorithm based on the Optimism in the Face of Uncertainty (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently.

We present an algorithm based on the Optimism in the Face of Uncertainty (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By evaluating the state-pair difference of the optimal bias function h*, the pro-posed algorithm achieves a regret bound of O( SAHT )1 for MDP with S states and A actions, in the case that an upper bound H on the span of h*, i.e., sp(h*) is known. This result outperforms the best previous regret bounds O~(HSradic(AT))[Bartlett and Tewari, 2009] by a factor of radic(SH). Furthermore, this regret bound matches the lower bound of Omega(radic(SAHT))[Jaksch et al., 2010] up to a logarithmic factor. As a consequence, we show that there is a near optimal regret bound of O~(radic(SADT)) for MDPs with finite diameter D compared to the lower bound of Omega(radic(SADT))[Jaksch et al., 2010].




上一篇:High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic 下一篇:Optimize x265 Rate Control: An Exploration of Lookahead in Frame Bit Allocation and Slice Type Decision 返回列表

用户登录

用户注册