延迟回报分解
Table of Contents

Bias-Variance for MDP

动作值函数的偏差和方差

$$
q^{\pi}(s,a) = \sum
$$