Abstract: Traditional performance potential-based learning algorithms can obtain optimal policies in MDP problems.

英美

文章摘要: 传统基于性能势的学习算法能获得马尔可夫决策问题的最优策略。

目录

查词历史