多语言语音识别技术研究

CORC > 自动化研究所 > 中国科学院自动化研究所 > 毕业生 > 博士学位论文

题名	多语言语音识别技术研究
作者	于胜民
学位类别	工学博士
答辩日期	2005-05-01
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	徐波
关键词	语音识别汉语识别英语识别日语识别多语言识别 Speech Recognition CSR ESR JSR Multilingual Bilingual
其他题名	Research of Multilingual Speech Recognition
学位专业	模式识别与智能系统
中文摘要	论文工作的主要内容和贡献如下： 1、深入分析了汉语语音识别的各项实现技术，如特征提取，决策树建模和识别器的搜索框架等。从语境相关建模和声学特征两个方面详细研究了声调信息对汉语识别系统的影响。此外还以音素为建模单元，重新搭建了一个汉语识别系统，从反面验证了声韵母建模的优势。 2、深入分析了英语的语言特点，详细考察了主流的英语语音识别技术，开发出英语识别系统，包括初始模型的生成、问题集的设计、基于决策树的三音子模型训练和识别搜索过程。在方差建模技术中引入了贝叶斯准则用于确定方差变换类别的个数。采用对数谱域的特征补偿算法，在不影响纯净语音识别效果的情况下提高了系统的抗噪性能。此外，还采用数据驱动的 MLLR 算法对非母语发音的口音自适应问题进行了研究。 3、深入分析了日语的发音和语言特征，定义了日语的声学基本建模单元，采用基于决策树的三音子建模方法，快速开发出我们的日语语音识别系统。提出了基于统计方法的端点检测算法，从统计学的观点出发估计端点的门限，具有较为鲁棒的抗噪性能。此外，还针对跨语言识别的方法，考察了从汉语、英语和汉英双语到日语的跨语言识别，给出了一些初步的实验结果。4、多语言语音识别的一个难点就是如何有效控制识别单元扩大带来的建模单元急剧增加的问题。我们以汉语和英语为研究对象，详细研究了汉英双语的混合声学建模问题。从直接合并汉英双语的建模单元到 IPA 映射，再到基于不同距离度量（Bhattacharyya 距离，似然度距离和最大互信息距离）的自动聚类算法，考察了各种方法的优缺点，探索出一条双语建模的有效途径。引入语言有关的问题，进一步改进了普通的决策树建模算法，使得问题的分裂更容易进行下去，对声学建模的精确性有一定的提高。
英文摘要	The main works of this paper are as follows. Developed Chinese Speech Recognition (CSR) system with phoneme as the base model, based on detailed study of our CSR technologies. Importance of Chinese tones information was showed from two aspects of speech recognition, such as feature and model. Characteristics of English were intensive studied first, and then our English Speech Recognition (ESR) system was developed, including initial model training and design of question set and decision tree based triphone model training and search process of the recognizer. Then semi-tied covariance modeling techniques are improved using more robust Bayes information as the criterior of deciding the number of covariance transformation matrix. The compensation in the log-spectral domain is also investigated to gain more robust acoustic model. At last, nonnative speaker adaptation was tested by data driven maximum likelihood linear regression (MLLR) fast adaptation algorithm. Japanese Speech Recognition (JSR) system was developed rapidly with fast bootstrapping method of MSR. Then end-point detection algorithm based on statistics is suggested. This algorithm is more robust for noisy speech than others. At last, simple tests of cross-language speech recognition from Chinese and English and Chinese-English bilingual system to Japanese were carried out. The results showed that bilingual acoustic model performed better than language-dependent models. Several Chinese-English bilingual acoustic modeling techniques were explored intensively, such as direct combination of two sets of base model and IPA mapping and automatic agglomerative clustering by different distance measures, e.g., Bhattacharyya distance, log-likelihood and maximum mutual information (MMI). Language related questions were adopted in the decision tree training process and achieved higher performance than the traditional method.
语种	中文
其他标识符	200118014604898
内容类型	学位论文
源URL	[http://ir.ia.ac.cn/handle/173211/5866]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	于胜民. 多语言语音识别技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2005.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们