CORC  > 北京大学  > 信息科学技术学院
Modeling Speaker Variability Using Long Short-Term Memory Networks for Speech Recognition
Li, Xiangang ; Wu, Xihong
2015
关键词speech recognition long short-term memory d-vector speaker adaptation i-vector deep neural networks ADAPTATION TRANSFORMATIONS LSTM
英文摘要Speaker adaptation of deep neural networks (DNNs) based acoustic models is still a challenging area of research. Considering that long short-term memory (LSTM) recurrent neural networks (RNNs) have been successfully applied to many sequence prediction and sequence labeling tasks, we propose to use LSTM RNNs for modeling speaker variability in automatic speech recognition (ASR). Firstly, the LSTM RNNs are used for extracting d-vectors (deep vector), which are then concatenated with the raw features for acoustic models. The speaker information provided by d-vectors helps DNNs based acoustic models figure out the speaker normalization during training. Furthermore, motivated by the idea that speech message can also be useful for speaker recognition, a new network called as cross-LSTM is proposed, which consist of two LSTMs: one for classifying speakers and the other for classifying senones. As a result, the speaker recognition and speech recognition are conducted simultaneously. Experiments are conducted on a conversational telephone speech corpus. Experimental results show the proposed models are effective for alleviating speaker variability in ASR, and yield 6% relative improvement for the LSTMP RNNs based systems.; CPCI-S(ISTP); lixg@cis.pku.edu.cn; wxh@cis.pku.edu.cn; 1086-1090
语种英语
出处16th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2015)
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/450149]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Li, Xiangang,Wu, Xihong. Modeling Speaker Variability Using Long Short-Term Memory Networks for Speech Recognition. 2015-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace