| 1. | In order to solve the multi - equilibria problem in the stochastic games , a macrl algorithm called macrl - japs is proposed . these two learning methods have been justified by experiments . the main research achievements and innovations are the establishment of two macrl methods for pursuit game , which are justified by experiments 针对联合行为学习者,给出了多agent协同强化学习的团队随机博弈框架,并解决了多最优均衡解问题,提出了基于联合行为优先序列的多agent协同强化学习方法macrl - japs 。 |