The objective is to find an optimal policy which maximizes the expected average reward per time step over infinite horizon.

  • 目的是寻找使得长期每阶段期望平均报酬最大的最优控制策略。
目录 查词历史