Cross-Modality Synergy Network for Referring Expression Comprehension and Segmentation
Li, Qianzhong1,2; Zhang, Yujia1; Sun, Shiying1; Wu, Jinting1,2; Zhao, Xiaoguang1; Tan, Min1
刊名Neurocomputing
2022-01-07
卷号467期号:/页码:99-114
关键词Referring expression comprehension Referring expression segmentation Cross-modality synergy Attention mechanism
ISSN号0925-2312
DOI10.1016/j.neucom.2021.09.066
英文摘要

Referring expression comprehension and segmentation aim to locate and segment a referred instance in an image according to a natural language expression. However, existing methods tend to ignore the interaction between visual and language modalities for visual feature learning, and establishing a synergy between the visual and language modalities remains a considerable challenge. To tackle the above problems, we propose a novel end-to-end framework, Cross-Modality Synergy Network (CMS-Net), to address the two tasks jointly. In this work, we propose an attention-aware representation learning module to learn modal representations for both images and expressions. A language self-attention submodule is proposed in this module to learn expression representations by leveraging the intra-modality relations, and a language-guided channel-spatial attention submodule is introduced to obtain the language aware visual representations under language guidance, which helps the model pay more attention to the referent-relevant regions in the images and relieve background interference. Then, we design a cross-modality synergy module to establish the inter-modality relations for modality fusion. Specifically, a language-visual similarity is obtained at each position of the visual feature map, and the synergy is achieved between the two modalities in both semantic and spatial dimensions. Furthermore, we propose a multi-scale feature fusion module with a selective strategy to aggregate the important information from multi-scale features, yielding target results. We conduct extensive experiments on four challenging benchmarks, and our framework achieves significant performance gains over state-of-the-art methods. (c) 2021 Elsevier B.V. All rights reserved.

资助项目National Key Research and Development Project of China[2019YFB1310601] ; National Key R&D Program of China[2017YFC0820203-03] ; National Natural Science Foundation of China[62103410]
WOS研究方向Computer Science
语种英语
出版者ELSEVIER
WOS记录号WOS:000710121100009
资助机构National Key Research and Development Project of China ; National Key R&D Program of China ; National Natural Science Foundation of China
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/46309]  
专题自动化研究所_复杂系统管理与控制国家重点实验室_先进机器人控制团队
通讯作者Zhang, Yujia
作者单位1.Institute of Automation, Chinese Academy of Sciences
2.Universigy of Chinese Academy of Sciences
推荐引用方式
GB/T 7714
Li, Qianzhong,Zhang, Yujia,Sun, Shiying,et al. Cross-Modality Synergy Network for Referring Expression Comprehension and Segmentation[J]. Neurocomputing,2022,467(/):99-114.
APA Li, Qianzhong,Zhang, Yujia,Sun, Shiying,Wu, Jinting,Zhao, Xiaoguang,&Tan, Min.(2022).Cross-Modality Synergy Network for Referring Expression Comprehension and Segmentation.Neurocomputing,467(/),99-114.
MLA Li, Qianzhong,et al."Cross-Modality Synergy Network for Referring Expression Comprehension and Segmentation".Neurocomputing 467./(2022):99-114.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace