搜索
Home
»
machine-learning
»
reinforcement-learning
» 延迟回报分解
延迟回报分解
Table of Contents
Bias-Variance for MDP
Bias-Variance for MDP
动作值函数的偏差和方差
$$
q^{\pi}(s,a) = \sum
$$