Multi-modal semantic autoencoder for cross-modal retrieval

doi:10.1016/j.neucom.2018.11.042

CORC > 计算技术研究所 > 中国科学院计算技术研究所 > 中国科学院计算技术研究所期刊论文 > 英文

	Multi-modal semantic autoencoder for cross-modal retrieval
	Wang, Shuhui 2; Wu, Yiling 1,2; Huang, Qingming 1
刊名	NEUROCOMPUTING
	2019-02-28
卷号	331 页码:165-175
关键词	Cross-modal retrieval Multi-modal data Autoencoder
ISSN号	0925-2312
DOI	10.1016/j.neucom.2018.11.042
英文摘要	Cross-modal retrieval has gained much attention in recent years. As the research mainstream, most of existing approaches learn projections for data from different modalities into a common space where data can be compared directly. However, they neglect the preservation of feature and semantic information, so they are unable to obtain satisfactory results as expected. In this paper, we propose a two-stage learning method to learn multi-modal mappings that project multi-modal data to low dimensional embeddings that preserve both feature and semantic information. In the first stage, we combine both low-level feature and high-level semantic information to learn feature-aware semantic code vectors. In the second stage, we use encoder-decoder paradigm to learn projections. The encoder projects feature vectors to code vectors, and the decoder projects code vectors back to feature vectors. The encoder-decoder paradigm guarantees the embeddings to preserve both feature and semantic information. An alternating minimization procedure is developed to solve the multi-modal semantic autoencoder optimization problem. Extensive experiments on three benchmark datasets demonstrate that the proposed method outperforms state-of-the-art cross-modal retrieval methods. (C) 2018 Elsevier B.V. All rights reserved.
资助项目	National Natural Science Foundation of China[61672497] ; National Natural Science Foundation of China[61332016] ; National Natural Science Foundation of China[61620106009] ; National Natural Science Foundation of China[61650202] ; National Natural Science Foundation of China[U1636214] ; National Basic Research Program of China (973 Program)[2015CB351802] ; Key Research Program of Frontier Sciences of CAS[QYZDJ-SSW-SYS013]
WOS研究方向	Computer Science
语种	英语
出版者	ELSEVIER SCIENCE BV
WOS记录号	WOS:000455210900015
内容类型	期刊论文
源URL	[http://119.78.100.204/handle/2XEOYT63/3476]
专题	中国科学院计算技术研究所期刊论文_英文
通讯作者	Wang, Shuhui
作者单位	1.Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing 100049, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
推荐引用方式 GB/T 7714	Wang, Shuhui,Wu, Yiling,Huang, Qingming. Multi-modal semantic autoencoder for cross-modal retrieval[J]. NEUROCOMPUTING,2019,331:165-175.
APA	Wang, Shuhui,Wu, Yiling,&Huang, Qingming.(2019).Multi-modal semantic autoencoder for cross-modal retrieval.NEUROCOMPUTING,331,165-175.
MLA	Wang, Shuhui,et al."Multi-modal semantic autoencoder for cross-modal retrieval".NEUROCOMPUTING 331(2019):165-175.