Mutual Attention Inception Network for Remote Sensing Visual Question Answering

doi:10.1109/TGRS.2021.3079918

CORC > 西安光学精密机械研究所 > 中国科学院西安光学精密机械研究所 > 光学影像学习与分析中心

	Mutual Attention Inception Network for Remote Sensing Visual Question Answering
	Zheng, Xiangtao 3; Wang, Binqiang 2; Du, Xingqian 2; Lu, Xiaoqiang 1
刊名	IEEE Transactions on Geoscience and Remote Sensing
关键词	Attention mechanism feature fusion remote sensing visual question answering (RSVQA) semantic understanding
ISSN号	01962892;15580644
DOI	10.1109/TGRS.2021.3079918
产权排序	1
英文摘要	Remote sensing images (RSIs) containing various ground objects have been applied in many fields. To make semantic understanding of RSIs objective and interactive, the task remote sensing visual question answering (VQA) has appeared. Given an RSI, the goal of remote sensing VQA is to make an intelligent agent answer a question about the remote sensing scene. Existing remote sensing VQA methods utilized a nonspatial fusion strategy to fuse the image features and question features, which ignores the spatial information of images and word-level information of questions. A novel method is proposed to complete the task considering these two aspects. First, convolutional features of the image are included to represent spatial information, and the word vectors of questions are adopted to present semantic word information. Second, attention mechanism and bilinear technique are introduced to enhance the feature considering the alignments between spatial positions and words. Finally, a fully connected layer with softmax is utilized to output an answer from the perspective of the multiclass classification task. To benchmark this task, a RSIVQA dataset is introduced in this article. For each of more than 37,000 RSIs, the proposed dataset contains at least one or more questions, plus corresponding answers. Experimental results demonstrate that the proposed method can capture the alignments between images and questions. The code and dataset are available at https://github.com/spectralpublic/RSIVQA. IEEE
语种	英语
出版者	Institute of Electrical and Electronics Engineers Inc.
内容类型	期刊论文
源URL	[http://ir.opt.ac.cn/handle/181661/94878]
专题	西安光学精密机械研究所_光学影像学习与分析中心
作者单位	1.Key Laboratory of Spectral Imaging Technology CAS, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, China (e-mail: luxq666666@gmail.com) 2.Key Laboratory of Spectral Imaging Technology CAS, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, China, and also with the University of Chinese Academy of Sciences, Beijing 100049, China.; 3.Key Laboratory of Spectral Imaging Technology CAS, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, China.;
推荐引用方式 GB/T 7714	Zheng, Xiangtao,Wang, Binqiang,Du, Xingqian,et al. Mutual Attention Inception Network for Remote Sensing Visual Question Answering[J]. IEEE Transactions on Geoscience and Remote Sensing.
APA	Zheng, Xiangtao,Wang, Binqiang,Du, Xingqian,&Lu, Xiaoqiang.
MLA	Zheng, Xiangtao,et al."Mutual Attention Inception Network for Remote Sensing Visual Question Answering".IEEE Transactions on Geoscience and Remote Sensing