CORC  > 兰州理工大学  > 兰州理工大学  > 国际合作处(港澳台办)
Prioritized Experience Replay based on Multi-armed Bandit
Liu, Ximing2; Zhu, Tianqing3; Jiang, Cuiqing2; Ye, Dayong3; Zhao, Fuqing1
刊名EXPERT SYSTEMS WITH APPLICATIONS
2022-03-01
卷号189
关键词Deep reinforcement learning Q-learning Deep Q-network Experience replay Multi-armed Bandit
ISSN号0957-4174
DOI10.1016/j.eswa.2021.116023
英文摘要Experience replay has been widely used in deep reinforcement learning. The learning algorithm allows online reinforcement learning agents to remember and reuse experiences from the past. In order to further improve the sampling efficiency for experience replay, the most useful experiences are expected to be sampled with higher frequency. Existing methods usually designed their sampling strategy according to a few criteria, but they tended to combine different criteria in a linear or fixed manner, where the strategy were static and independent of the agent learner. This ignores the dynamic attribute of the environment and thus can only lead to a suboptimal performance. In this work, we propose a dynamic experience replay strategy according to the interaction between the agent and environment, which is called Prioritized Experience Replay based on Multi-armed Bandit (PERMAB). PERMAB can adaptively combine multiple priority criteria to measure the importance of the experience. In particular, the weight of each assessing criterion can be adaptively adjusted from episode to episode according to their respective contribution to the agent performance, which guarantees useful criterion to be weighted more in its current state. The proposed replay strategy is able to take both sample informativeness and diversity into consideration, which could significantly boosts learning ability and speed of the game agent. Experimental results show that PERMAB accelerates the network learning and achieves a better performance compared to baseline algorithms on seven benchmark environments with various difficulties.
WOS研究方向Computer Science ; Engineering ; Operations Research & Management Science
语种英语
出版者PERGAMON-ELSEVIER SCIENCE LTD
WOS记录号WOS:000714414800008
内容类型期刊论文
源URL[http://ir.lut.edu.cn/handle/2XXMBERH/154785]  
专题国际合作处(港澳台办)
作者单位1.Lanzhou Univ Technol, Sch Comp & Commun Technol, Lanzhou 730050, Peoples R China
2.Hefei Univ Technol, Sch Management, Hefei, Anhui, Peoples R China;
3.Univ Technol Sydney, Sch Comp Sci, Sydney, NSW, Australia;
推荐引用方式
GB/T 7714
Liu, Ximing,Zhu, Tianqing,Jiang, Cuiqing,et al. Prioritized Experience Replay based on Multi-armed Bandit[J]. EXPERT SYSTEMS WITH APPLICATIONS,2022,189.
APA Liu, Ximing,Zhu, Tianqing,Jiang, Cuiqing,Ye, Dayong,&Zhao, Fuqing.(2022).Prioritized Experience Replay based on Multi-armed Bandit.EXPERT SYSTEMS WITH APPLICATIONS,189.
MLA Liu, Ximing,et al."Prioritized Experience Replay based on Multi-armed Bandit".EXPERT SYSTEMS WITH APPLICATIONS 189(2022).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace