语音参数轨迹模型研究及其在可信度度量中的应用

CORC > 自动化研究所 > 中国科学院自动化研究所 > 毕业生 > 博士学位论文

题名	语音参数轨迹模型研究及其在可信度度量中的应用
作者	张翼燕
学位类别	工学博士
答辩日期	2004-08-26
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	徐波 ; 刘文举
关键词	语音识别参数轨迹模型搜索可信度度量 speech recognition Parametric Trajectory Model search Confidence Measures
其他题名	The Study of Parametric Trajectory Models and Its Applications in Confidence Measures
学位专业	模式识别与智能系统
中文摘要	迄今为止，在连续语音识别领域应用最成功最广泛的是HMM模型。为了获得高效的训练和识别算法，HMM假设特征之间相互独立，这是不符合语音信号的实际分布的。为此研究者们解除独立性假设，提出了更一般的模型——分段模型。本文对参数轨迹模型进行了全面的研究和深入的探讨，包括声调结合建模、搜索问题，以及它的可信度度量工作等。主要贡献如下：．完成了参数轨迹模型系统的实现，对模型的概率表达方式进行了分析。当多项式拟合阶数退化为0时，它相当于精确时长建模的HMM。在分段模型实验中，从数据拟合的角度验证了参数轨迹模型比起HMM模型具有更加精确的建模能力。参数轨迹模型方法对静音建模时存在着理论缺陷，我们设想静音有一条期望的直线轨迹来实现其参数轨迹的建模。同时发现，时长模型埘小帧数静音的识别非常重要；．探讨了结合声调特征的参数轨迹建模。它的软结合方法在特征层把基频作为第14维特征，对其进行轨迹拟合后得到的是这段语音的声调。参数轨迹模型的物理意义决定了它能直接反映基频曲线在空间的分布特性。硬结合方法在模型层将声调模型与声学模型相结合，利用的是参数轨迹模型作为分段模型的特性：它的框架结构在进行统计识别时可以很好地结合段特征。．参数轨迹模型获得了比HMM模型更加精确的建模能力，这是以计算复杂度增加为代价的。为了解决这个问题，提出了定长参数轨迹模型方法。它将归一化时间轨迹上的点重采样到同定的区域中，避免了小同时间点在不同段中的重复概率计算。论文还对模型实现过程中遇到的句予得分归一化等问题进行了处理。定长参数轨迹模型在数字串识别率略有下降（0.5%）的情况下，计算时间降低了90倍左右：．提出了参数轨迹模型和HMM模型相结合可信度度量方法，克服了传统可信度方法的不足。具体的方法有两种，一是得分的应用，可以和HMM得分相结合或者替代HMM得分使用，一方面引入了新的信息，另一方面改善了HMM对语音信号描述不够准确的缺陷；一是模型的融合，在同一系统中出现了两种不同的声学模型，它们提供了符自的识别结果相互验证。这样做避免了采用似然得分所带来的种种问题，在不同的句子间可以相互比较。在搜索阶段，对词图中要进行回溯的语音段提供新的声学模型（参数轨迹模型）识别，原HMM的识别结果在这个新的序列中所处的位置不同，其得到
英文摘要	So far HMM was most successfully and widely used in continuous speech recognition．It assumed the independence of feature vectors for the efficient training and recognition algorithm．However the assumption didn't accord with the actual distribution of speech signals．Alternative models that attempt to over- come this difficulty were proposed．They were usually known by the name segmental models．The dissertation thoroughly disccussed Parametric Trajectory Models(PTM)，including tone as segmental features in the model，the search of continuous speech recognition and its application in confidence measures．The main works are as following： We realized the PTM recognition system and analysed its probability expressions． When polynomial rank R is zero PTM degenerated to HMM with explicit duration modeling．In multi-segment modelings，the data fitting experiments verified that PTM has more accurate modeling ability compared with H M M．For background silence we supposed that it had an expected linear trajectory though its signal point was irrelated to time．The duration model was import ant to the recognition of silence with frames 1ess than 5. Tone as segmental feature was applicated in PTM．Its soft integration used pitch as the 14th feature just as MFCC．Emulating its trajectory we got the tone of the speech segment．The math essence of PTM determined that, the soft integration can straightly reflect the distribution characteristic of pitch in space． The hard integration method combined tone model with acoustic model．which utilized the property of PTM as segmental models：its structure allowed segmental feature measurements． PTM had more accurate modeling than HMM at the expense of much higher， computation complexity．To Solve this problem the dissertation proposed Fixedframe Parametric Trajectory Model(FPTM)，which re-sampled the points in the normalized-time trajectory to the fixed regions and thus avoided the repeated probability calculations of different time points in different speech segments． FPTM can cut 90 times computation complexity while the digit string accuracy falled 0.5%. PTM was attempted in the work of confidence measures．Two methods Were introduced．one was the application of scores．which can be combined with or substituted for HMM scores．The other was the application of recognition results， which verified HMM result．The former introduced new information and improved the description of speech signal。The latter overcomed the limitation of traditional HMMs that it cannot be compared between sentences．In A search we re-recognized the speech segments to be traced in the word lattice using PTM． HMM result was in the different position in the new recognition sequence and it got different confidence weight．So the priority ranking of tracing paths was altered and the recognition accuracy was improved．In hypothesis testing，on the basis of PTM verification Fisher classifier
语种	中文
其他标识符	836
内容类型	学位论文
源URL	[http://ir.ia.ac.cn/handle/173211/5829]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	张翼燕. 语音参数轨迹模型研究及其在可信度度量中的应用[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2004.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们