It is rational to adopt the average reward reinforcement learning algorithms for solving the absorbing goal states cyclical tasks: It has the merit of converging quickly and robustly.

  • 摘要对于有吸收目标状态的循环任务,比较合理的方法是采用基于平均报酬模型的强化学习。平均报酬模型强化学习具有收敛速度快、鲁棒性强等优点。
目录 查词历史