Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings
Yi, Jiangyan1; Tao, Jianhua2,3; Fu, Ruibo1; Wang, Tao1; Zhang, Chu Yuan1; Wang, Chenglong4
刊名IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
2023
卷号31页码:2963-2973
关键词Adversarial training multi-task learning prosodic boundaries speech synthesis multi-modal embeddings
ISSN号2329-9290
DOI10.1109/TASLP.2023.3301235
通讯作者Yi, Jiangyan(jiangyan.yi@nlpr.ia.ac.cn) ; Tao, Jianhua(jhtao@tsinghua.edu.cn)
英文摘要boundaries are still crucial to the natural-ness of end-to-end speech synthesis systems. This article proposes to use adversarial multi-task learning to predict prosodic boundaries. Adversarial multi-task learning is utilized to transfer knowledge from an auxiliary POS tagging task to a prosodic boundary pre-diction task. Furthermore, multi-modal embeddings are composed of contextual word and speech embedding features obtained from the pre-trained bidirectional encoder representations from trans-formers (BERT) model and Speech2Vec. We can utilize linguistic and acoustic information from large amounts of external text and speech data without prosodic boundary labels. At the inference stage, the prosodic boundary predicting model can use the syntactic features learnt from the POS tagging task without any extra compu-tation cost due to only employing the prosodic boundary predicting task to decode. We conducted experiments on Mandarin datasets. The results show that the models using multi-modal embeddings from the pre-trained BERT and Speech2Vec outperform the mod-els trained with single modal embedding. Furthermore, the mod-els trained with adversarial training obtain further performance gains by up to 2.95% in F-1 score.
资助项目National Natural Science Foundation of China (NSFC)[61831022] ; National Natural Science Foundation of China (NSFC)[U21B2010] ; National Natural Science Foundation of China (NSFC)[62101553] ; National Natural Science Foundation of China (NSFC)[61971419] ; National Natural Science Foundation of China (NSFC)[62006223] ; National Natural Science Foundation of China (NSFC)[62276259] ; National Natural Science Foundation of China (NSFC)[62201572] ; National Natural Science Foundation of China (NSFC)[62206278] ; Beijing Municipal Science and Technology Commission, Administrative Commission of Zhongguancun Science[Z211100004821013]
WOS关键词SPEECH SYNTHESIS ; SEQUENCE ; MODEL
WOS研究方向Acoustics ; Engineering
语种英语
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号WOS:001045259400002
资助机构National Natural Science Foundation of China (NSFC) ; Beijing Municipal Science and Technology Commission, Administrative Commission of Zhongguancun Science
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/53906]  
专题多模态人工智能系统全国重点实验室
通讯作者Yi, Jiangyan; Tao, Jianhua
作者单位1.Chinese Acad Sci, Univ Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Arcial Intelligence Syst, Beijing 101408, Peoples R China
2.Tsinghua Univ, Dept Automat, Beijing 100190, Peoples R China
3.Univ Sci & Technol China, Sch Artificial Intelligence, Hefei 230026, Peoples R China
4.Univ Chinese Acad Sci, Beijing 101408, Peoples R China
推荐引用方式
GB/T 7714
Yi, Jiangyan,Tao, Jianhua,Fu, Ruibo,et al. Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,2023,31:2963-2973.
APA Yi, Jiangyan,Tao, Jianhua,Fu, Ruibo,Wang, Tao,Zhang, Chu Yuan,&Wang, Chenglong.(2023).Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings.IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,31,2963-2973.
MLA Yi, Jiangyan,et al."Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings".IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 31(2023):2963-2973.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace