CORC  > 自动化研究所  > 中国科学院自动化研究所  > 毕业生  > 博士学位论文
题名多语言语音识别技术研究
作者于胜民
学位类别工学博士
答辩日期2005-05-01
授予单位中国科学院研究生院
授予地点中国科学院自动化研究所
导师徐波
关键词语音识别 汉语识别 英语识别 日语识别 多语言识别 Speech Recognition CSR ESR JSR Multilingual Bilingual
其他题名Research of Multilingual Speech Recognition
学位专业模式识别与智能系统
中文摘要论文工作的主要内容和贡献如下: 1、深入分析了汉语语音识别的各项实现技术,如特征提取,决策树建模和识别器的搜索框架等。从语境相关建模和声学特征两个方面详细研究了声调信息对汉语识别系统的影响。此外还以音素为建模单元,重新搭建了一个汉语识别系统,从反面验证了声韵母建模的优势。 2、深入分析了英语的语言特点,详细考察了主流的英语语音识别技术,开发出英语识别系统,包括初始模型的生成、问题集的设计、基于决策树的三音子模型训练和识别搜索过程。在方差建模技术中引入了贝叶斯准则用于确定方差变换类别的个数。采用对数谱域的特征补偿算法,在不影响纯净语音识别效果的情况下提高了系统的抗噪性能。此外,还采用数据驱动的 MLLR 算法对非母语发音的口音自适应问题进行了研究。 3、深入分析了日语的发音和语言特征,定义了日语的声学基本建模单元,采用基于决策树的三音子建模方法,快速开发出我们的日语语音识别系统。提出了基于统计方法的端点检测算法,从统计学的观点出发估计端点的门限,具有较为鲁棒的抗噪性能。此外,还针对跨语言识别的方法,考察了从汉语、英语和汉英双语到日语的跨语言识别,给出了一些初步的实验结果。4、多语言语音识别的一个难点就是如何有效控制识别单元扩大带来的建模单元急剧增加的问题。我们以汉语和英语为研究对象,详细研究了汉英双语的混合声学建模问题。从直接合并汉英双语的建模单元到 IPA 映射,再到基于不同距离度量(Bhattacharyya 距离,似然度距离和最大互信息距离)的自动聚类算法,考察了各种方法的优缺点,探索出一条双语建模的有效途径。引入语言有关的问题,进一步改进了普通的决策树建模算法,使得问题的分裂更容易进行下去,对声学建模的精确性有一定的提高。
英文摘要The main works of this paper are as follows. Developed Chinese Speech Recognition (CSR) system with phoneme as the base model, based on detailed study of our CSR technologies. Importance of Chinese tones information was showed from two aspects of speech recognition, such as feature and model. Characteristics of English were intensive studied first, and then our English Speech Recognition (ESR) system was developed, including initial model training and design of question set and decision tree based triphone model training and search process of the recognizer. Then semi-tied covariance modeling techniques are improved using more robust Bayes information as the criterior of deciding the number of covariance transformation matrix. The compensation in the log-spectral domain is also investigated to gain more robust acoustic model. At last, nonnative speaker adaptation was tested by data driven maximum likelihood linear regression (MLLR) fast adaptation algorithm. Japanese Speech Recognition (JSR) system was developed rapidly with fast bootstrapping method of MSR. Then end-point detection algorithm based on statistics is suggested. This algorithm is more robust for noisy speech than others. At last, simple tests of cross-language speech recognition from Chinese and English and Chinese-English bilingual system to Japanese were carried out. The results showed that bilingual acoustic model performed better than language-dependent models. Several Chinese-English bilingual acoustic modeling techniques were explored intensively, such as direct combination of two sets of base model and IPA mapping and automatic agglomerative clustering by different distance measures, e.g., Bhattacharyya distance, log-likelihood and maximum mutual information (MMI). Language related questions were adopted in the decision tree training process and achieved higher performance than the traditional method.
语种中文
其他标识符200118014604898
内容类型学位论文
源URL[http://ir.ia.ac.cn/handle/173211/5866]  
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
于胜民. 多语言语音识别技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2005.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace