CORC  > 自动化研究所  > 中国科学院自动化研究所  > 毕业生  > 博士学位论文
题名汉语语音识别中随机段模型优化算法研究
作者晁浩
学位类别工学博士
答辩日期2012-05-30
授予单位中国科学院研究生院
授予地点中国科学院自动化研究所
导师刘文举
关键词语音识别 随机段模型 说话人自适应 声调 协同发音 Speech recognition segment model speaker adaptation tone co-articulation
其他题名Optimization Algorithm of Stochastic Segment Modeling in Mandarin Speech Recognition
学位专业控制理论与控制工程
中文摘要隐马尔科夫模型(Hidden Markov Model, HMM)是当前连续语音识别领域应用最为广泛的声学模型,但HMM基于的语音帧之间相互独立的假设并不符合语音信号真实分布。为此,研究人员提出了几种替代模型,随机段模型(Stochastic Segment Modeling, SSM)就是其中的一种。 相对于HMM,随机段模型是一种更为精确的模型,并且能更为方便地在模型中加入超音段信息。但基于随机段模型的大词汇量连续语音识别系统(Large Vocabulary Continuous Speech Recognition, LVCSR)也存在模型计算复杂度较高,解码速度较慢等制约其实用化的关键性问题。为了降低随机段模型解码时的计算复杂度并进一步提高其模型精度,本文做的主要工作有: 分析了隐马尔科夫模型与随机段模型的差异,将HMM系统中常用的说话人自适应方法:最大似然线性回归方法(Maximum Likelihood Linear Regression Adaptation Method,MLLR)引入到随机段模型系统中,系统识别错误率相对下降了7.5%。实验表明MLLR方法在随机段模型系统中同样能取得较好的效果。 提出一个初步的框架,检测出具有发音学意义的时间点,根据这些时间点分析临近语音段的边界信息和声韵母类别信息,最后将这些边界信息和类别信息用于指导随机段模型的搜索过程。实验中,两种类型的时间点能较为准确地被检测出来,并用于指导解码。在识别正确率只有轻微下降的同时,解码时间有了较大的下降。 考虑到随机段模型能够更好地利用超音段信息,将声调信息用于随机段模型系统中,从而提高系统的性能。该方法分析汉语普通话中的发音特性对基频轮廓的影响,并利用阶层式人工类神经网络获取发音特征。将发音特征和韵律特征一起用于建立显式的声调模型,最后将声调模型用于随机段模型的一遍解码中,并取得了较好效果。 考虑到由于协同发音(Co-articulation)现象引起的语音信号的易变性,训练音节内的双音子模型来初始化基于音节的声学模型的参数,从而解决音节内部声韵母之间的协同发音现象;并用随机段模型作为音节之间的过渡模型来缓解音节之间的协同发音问题。
英文摘要Currently, the hidden Markov model is widely used in speech recognition systems and can derive good performance. However, the assumptions of HMM, such as constant statistics within an HMM state and observations conditional independent, are not realistic for sequences of speech spectra. To resolve the limitations of HMM, some alternative models have been proposed. One of the models is stochastic segment model (SSM). SSM-based speech recognition system can obtain higher accuracy and exploit suprasegmental information more effectively. However, the higher complexity and computation of SSM hinder its further development in speech recognition.This dissertation focuses on modeling and decoding of SSM, and the main research works a following aspects: Extend the theory of Maximum Likelihood Linear Regression adaptation method (MLLR) to the stochastic segment model(SSM), and derive the SSM-based MLLR adaptation method. Continuous speech recognition experiment using the SSM-based MLLR adaptation method derives about 7.5% relative improvement from the speaker independent (SI) system and shows the SSM-based MLLR method can also improve the recognition performance. Propose a framework which attempts to incorporate landmarks into SSM-based Mandarin speech recognition system. In the method, landmarks provide boundary information and phonetic class information, and the information is used to direct the decoding process. To prove the validity of this method, two kinds of landmarks which can be detected reliably are used to direct the decoding process of the SM-based Mandarin LVCSR system. Much decoding time can be saved without obvious decrease in the recognition accuracy. Investigates the influence of articulatory characteristic on the pitch contour, and hierarchical MLP classifiers are used to obtain articulatory features. Then articulatory features as a form of tonal features are exploit for tone modeling. Finally, the tone models are fused into the SSM-based speech recognition system according to the property of segmental models: its structure in favour of suprasegmental information. In view of the acoustic variabilities caused by co-articulation, construct syllable based acoustic models,which are initialized by intra-syllable initial/final based diphones to capture intra-syllable co-articulation effect. Segmental models, which are regarded as inter-syllable transition models, are incorporated into recognition system to capture inter-syllable co-articulation ef...
语种中文
其他标识符200818014628002
内容类型学位论文
源URL[http://ir.ia.ac.cn/handle/173211/6462]  
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
晁浩. 汉语语音识别中随机段模型优化算法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2012.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace