VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation

doi:10.1109/TNNLS.2022.3161314

CORC > 自动化研究所 > 中国科学院自动化研究所 > 智能感知与计算研究中心

	VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation
	Hao, Wangli 1,5; Guan, He 1,4; Zhang, Zhaoxiang 2,3
刊名	IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
	2022-04-08
页码	13
关键词	Task analysis Instruments Visualization Image reconstruction Generators Decoding Generative adversarial networks Cross modality cross-modal generation mutual generation visual and audio
ISSN号	2162-237X
DOI	10.1109/TNNLS.2022.3161314
通讯作者	Zhang, Zhaoxiang(zhaoxiang.zhang@ia.ac.cn)
英文摘要	Considering both audio and visual modalities is helpful for understanding a video. In the face of harsh environmental interference or signal packet loss, automatically compensating for audio and vision is a challenging task. We propose a dynamic cross-modal visual-audio mutual generation model (VAMG), which includes audio to visual conversion, visual to audio conversion, audio self-generation, and visual self-generation. VAMG jointly optimizes modal reconstruction and adversarial constraints, effectively solving the problems of structural alignment and signal compensation in incomplete videos. We conducted an instrument-oriented and pose-oriented cross-modal audio-visual mutual generation experiment on the sub-University of Rochester Musical Performance dataset to verify the effectiveness of the model.
资助项目	Major Project for New Generation of AI[2018AAA0100400] ; National Natural Science Foundation of China[61836014] ; National Natural Science Foundation of China[U21B2042] ; National Natural Science Foundation of China[62072457] ; National Natural Science Foundation of China[62006231] ; InnoHK Program
WOS研究方向	Computer Science ; Engineering
语种	英语
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号	WOS:000782832800001
资助机构	Major Project for New Generation of AI ; National Natural Science Foundation of China ; InnoHK Program
内容类型	期刊论文
源URL	[http://ir.ia.ac.cn/handle/173211/48357]
专题	自动化研究所_智能感知与计算研究中心
通讯作者	Zhang, Zhaoxiang
作者单位	1.Chinese Acad Sci CASIA, Ctr Res Intelligent Percept & Comp CRIPAC, Inst Automat, Natl Lab Pattern Recognit NLPR, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Ctr Res Intelligent Percept & Comp, Inst Automat, Beijing 100190, Peoples R China 3.Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing 101408, Peoples R China 4.Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 5.Univ Chinese Acad Sci UCAS, Beijing 100190, Peoples R China
推荐引用方式 GB/T 7714	Hao, Wangli,Guan, He,Zhang, Zhaoxiang. VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS,2022:13.
APA	Hao, Wangli,Guan, He,&Zhang, Zhaoxiang.(2022).VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS,13.
MLA	Hao, Wangli,et al."VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation".IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2022):13.