It is rational to adopt the average reward reinforcement learning algorithms for solving the absorbing goal states cyclical tasks: It has the merit of converging quickly and robustly.

英美

摘要对于有吸收目标状态的循环任务，比较合理的方法是采用基于平均报酬模型的强化学习。平均报酬模型强化学习具有收敛速度快、鲁棒性强等优点。

目录

查词历史