Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos.
Wang, Yali; Du, Wenbin; Qiao, Yu
刊名IEEE TRANSACTIONS ON IMAGE PROCESSING
2018
文献子类期刊论文
英文摘要Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality and contain rich human dynamics with various motion scales, which makes the traditional RNNs difficult to capture complex action information. In this paper, we propose a novel recurrent spatial-temporal attention network (RSTAN) to address this challenge, where we introduce a spatial-temporalattention mechanism to adaptively identify key features from the global video context for every time-step prediction of RNN. More specifically, we make three main contributions from the following aspects. First, we reinforce the classical long short-term memory (LSTM) with a novel spatial-temporal attention module. At each time step, our module can automatically learn a spatial-temporal action representation from all sampled video frames, which is compact and highly relevant to the prediction at the current step. Second, we design an attention-driven appearance-motion fusion strategy to integrate appearance and motion LSTMs into a unified framework, where LSTMs with their spatial-temporal attention modules in two streams can be jointly trained in an end-to-end fashion. Third, we develop actor-attention regularization for RSTAN, which can guide our attention mechanism to focus on the important action regions around actors. We evaluate the proposed RSTAN on the benchmark UCF101, HMDB51 and JHMDB data sets. The experimental results show that, our RSTAN outperforms other recent RNN-based approaches on UCF101 and HMDB51 as well as achieves the state-of-the-art on JHMDB.
URL标识查看原文
语种英语
内容类型期刊论文
源URL[http://ir.siat.ac.cn:8080/handle/172644/13467]  
专题深圳先进技术研究院_集成所
推荐引用方式
GB/T 7714
Wang, Yali,Du, Wenbin,Qiao, Yu. Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos.[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING,2018.
APA Wang, Yali,Du, Wenbin,&Qiao, Yu.(2018).Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos..IEEE TRANSACTIONS ON IMAGE PROCESSING.
MLA Wang, Yali,et al."Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos.".IEEE TRANSACTIONS ON IMAGE PROCESSING (2018).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace