CORC  > 自动化研究所  > 中国科学院自动化研究所  > 毕业生  > 博士学位论文
题名语音参数轨迹模型研究及其在可信度度量中的应用
作者张翼燕
学位类别工学博士
答辩日期2004-08-26
授予单位中国科学院研究生院
授予地点中国科学院自动化研究所
导师徐波 ; 刘文举
关键词语音识别 参数轨迹模型 搜索 可信度度量 speech recognition Parametric Trajectory Model search Confidence Measures
其他题名The Study of Parametric Trajectory Models and Its Applications in Confidence Measures
学位专业模式识别与智能系统
中文摘要迄今为止,在连续语音识别领域应用最成功最广泛的是HMM模型。为了获得高效的 训练和识别算法,HMM假设特征之间相互独立,这是不符合语音信号的实际分布的。为 此研究者们解除独立性假设,提出了更一般的模型——分段模型。本文对参数轨迹模型进行了全面的研究和深入的探讨,包括声调结合建模、搜索问题,以及它的可信度度量工作 等。主要贡献如下: .完成了参数轨迹模型系统的实现,对模型的概率表达方式进行了分析。当多项式拟 合阶数退化为0时,它相当于精确时长建模的HMM。在分段模型实验中,从数据拟 合的角度验证了参数轨迹模型比起HMM模型具有更加精确的建模能力。参数轨迹模 型方法对静音建模时存在着理论缺陷,我们设想静音有一条期望的直线轨迹来实现 其参数轨迹的建模。同时发现,时长模型埘小帧数静音的识别非常重要; .探讨了结合声调特征的参数轨迹建模。它的软结合方法在特征层把基频作为第14维 特征,对其进行轨迹拟合后得到的是这段语音的声调。参数轨迹模型的物理意义决 定了它能直接反映基频曲线在空间的分布特性。硬结合方法在模型层将声调模型与 声学模型相结合,利用的是参数轨迹模型作为分段模型的特性:它的框架结构在进 行统计识别时可以很好地结合段特征。 .参数轨迹模型获得了比HMM模型更加精确的建模能力,这是以计算复杂度增加为代 价的。为了解决这个问题,提出了定长参数轨迹模型方法。它将归一化时间轨迹上 的点重采样到同定的区域中,避免了小同时间点在不同段中的重复概率计算。论文 还对模型实现过程中遇到的句予得分归一化等问题进行了处理。定长参数轨迹模型 在数字串识别率略有下降(0.5%)的情况下,计算时间降低了90倍左右: .提出了参数轨迹模型和HMM模型相结合可信度度量方法,克服了传统可信度方 法的不足。具体的方法有两种,一是得分的应用,可以和HMM得分相结合或者替 代HMM得分使用,一方面引入了新的信息,另一方面改善了HMM对语音信号描述 不够准确的缺陷;一是模型的融合,在同一系统中出现了两种不同的声学模型,它 们提供了符自的识别结果相互验证。这样做避免了采用似然得分所带来的种种问 题,在不同的句子间可以相互比较。在搜索阶段,对词图中要进行回溯的语音段提 供新的声学模型(参数轨迹模型)识别,原HMM的识别结果在这个新的序列中所处的位置不同,其得到
英文摘要So far HMM was most successfully and widely used in continuous speech recognition.It assumed the independence of feature vectors for the efficient training and recognition algorithm.However the assumption didn't accord with the actual distribution of speech signals.Alternative models that attempt to over- come this difficulty were proposed.They were usually known by the name segmental models.The dissertation thoroughly disccussed Parametric Trajectory Models(PTM),including tone as segmental features in the model,the search of continuous speech recognition and its application in confidence measures.The main works are as following: We realized the PTM recognition system and analysed its probability expressions. When polynomial rank R is zero PTM degenerated to HMM with explicit duration modeling.In multi-segment modelings,the data fitting experiments verified that PTM has more accurate modeling ability compared with H M M.For background silence we supposed that it had an expected linear trajectory though its signal point was irrelated to time.The duration model was import ant to the recognition of silence with frames 1ess than 5. Tone as segmental feature was applicated in PTM.Its soft integration used pitch as the 14th feature just as MFCC.Emulating its trajectory we got the tone of the speech segment.The math essence of PTM determined that, the soft integration can straightly reflect the distribution characteristic of pitch in space. The hard integration method combined tone model with acoustic model.which utilized the property of PTM as segmental models:its structure allowed segmental feature measurements. PTM had more accurate modeling than HMM at the expense of much higher, computation complexity.To Solve this problem the dissertation proposed Fixedframe Parametric Trajectory Model(FPTM),which re-sampled the points in the normalized-time trajectory to the fixed regions and thus avoided the repeated probability calculations of different time points in different speech segments. FPTM can cut 90 times computation complexity while the digit string accuracy falled 0.5%. PTM was attempted in the work of confidence measures.Two methods Were introduced.one was the application of scores.which can be combined with or substituted for HMM scores.The other was the application of recognition results, which verified HMM result.The former introduced new information and improved the description of speech signal。The latter overcomed the limitation of traditional HMMs that it cannot be compared between sentences.In A search we re-recognized the speech segments to be traced in the word lattice using PTM. HMM result was in the different position in the new recognition sequence and it got different confidence weight.So the priority ranking of tracing paths was altered and the recognition accuracy was improved.In hypothesis testing,on the basis of PTM verification Fisher classifier
语种中文
其他标识符836
内容类型学位论文
源URL[http://ir.ia.ac.cn/handle/173211/5829]  
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
张翼燕. 语音参数轨迹模型研究及其在可信度度量中的应用[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2004.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace