Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection
Jiao,Yifan1; Li,Zhetao2; Huang,Shucheng1; Yang,Xiaoshan3,4; Liu,Bin5; Zhang,Tianzhu3,4
刊名IEEE TRANSACTIONS ON MULTIMEDIA
2018-10
卷号20期号:10页码:2693-2705
关键词Video Highlight Detection Attention Model Deep Ranking
英文摘要
The video highlight detection task is to localize key
elements (moments of user’s major or special interest) in a video.
Most of the existing highlight detection approaches extract features
from the video segment as a whole without considering the
difference of local features both temporally and spatially. Due to
the complexity of video content, this kind of mixed features will
impact the final highlight prediction. In temporal extent, not all
frames are worth watching because some of them only contain the
background of the environment without human or other moving
objects. In spatial extent, it is similar that not all regions in each
frame are highlights especially when there are lots of clutters in
the background. To solve the above problem, we propose a novel
three-dimensional (3-D) (spatial+temporal) attention model that
can automatically localize the key elements in a video without any
extra supervised annotations. Specifically, the proposed attention
model produces attention weights of local regions along both the
spatial and temporal dimensions of the video segment. The regions
of key elements in the video will be strengthened with large weights.
Thus, the more effective feature of the video segment is obtained to
predict the highlight score. The proposed 3-D attention scheme can
be easily integrated into a conventional end-to-end deep ranking
model that aims to learn a deep neural network to compute the
highlight score of each video segment. Extensive experimental
results on the YouTube and SumMe datasets demonstrate that the
proposed approach achieves significant improvement over state-of-
the-art methods. With the proposed 3-D attention model, video
highlights can be accurately retrieved in spatial and temporal
dimensions without human supervision in several domains, such
as gymnastics, parkour, skating, skiing, surfing, and dog activities,
on the public datasets.
语种英语
WOS记录号WOS:000444903000013
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/22067]  
专题自动化研究所_模式识别国家重点实验室_多媒体计算与图形学团队
作者单位1.Jiangsu University of Science and Technology
2.College of Information Engineering, Xiangtan University
3.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
4.University of Chinese Academy of Sciences
5.Moshanghua Tech Company
推荐引用方式
GB/T 7714
Jiao,Yifan,Li,Zhetao,Huang,Shucheng,et al. Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2018,20(10):2693-2705.
APA Jiao,Yifan,Li,Zhetao,Huang,Shucheng,Yang,Xiaoshan,Liu,Bin,&Zhang,Tianzhu.(2018).Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection.IEEE TRANSACTIONS ON MULTIMEDIA,20(10),2693-2705.
MLA Jiao,Yifan,et al."Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection".IEEE TRANSACTIONS ON MULTIMEDIA 20.10(2018):2693-2705.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace