详细解决方案
Reinforcement Learning(四):Actor-Critic Methods
热度:14 发布时间:2023-12-12 01:06:30.0

主要思想:

Policy Network (Actor)

Value Network (Critic):

形象对比:

Train the Neural Networks

具体步骤:

Update value network q using TD

Update policy network Π using policy gradient

Actor-Critic Method




Summary of Algorithm


Summary
Policy Network and Value Network


Training
