面向窄带通信信道的语音质量增强问题研究

CORC > 自动化研究所 > 中国科学院自动化研究所 > 毕业生 > 博士学位论文

题名	面向窄带通信信道的语音质量增强问题研究
作者	刘斌
学位类别	工学博士
答辩日期	2015-05-26
授予单位	中国科学院大学
授予地点	中国科学院自动化研究所
导师	陶建华
关键词	语音端点检测单通道语音增强极低速率语音编码语音带宽扩展深层神经网络 speech activity detection single channel speech enhancement speech coding at very low bit rate speech bandwidth extension deep neural network
其他题名	Research on Speech Quality Enhancement in Narrowband Communication Channel
学位专业	模式识别与智能系统
中文摘要	在窄带语音通信系统中，一个重要的问题就是如何在复杂环境下充分利用带宽资源对语音信号进行有效的传输和增强处理，以保证语音通信系统中语音信号的话音质量。在实际的通信环境中，存在着各种随机噪声的干扰；在这种条件下，准确的检测出语音信号并有效的对语音和噪声进行分离，在各种语音通信系统中均有着紧迫的需求；很多主流的语音增强算法在非平稳噪声条件下，难以有效的对语音信息进行准确的估计，增强后的语音音质难以达到令人满意的结果；同时，在一些窄带通信信道中由于比特资源和带宽的限制，只能保证语音在极低速率模式下进行传输，因此，研究一种有效的极低速率语音编码算法在无线窄带通信系统和水声通信系统中均有着广泛的需求，但是随着码率的降低语音编码的音质下降严重；此外，在窄带通信信道中，高频带的丢失也将直接影响到语音的自然度。因此，研究在实际环境中提高窄带语音通信系统的话音质量具有十分重要的理论意义和应用价值，同时它也是一项颇具挑战性的课题。针对窄带语音通信系统中存在的上述问题，本文围绕着窄带语音通信系统中话音质量增强问题进行了深入的探索和研究。主要的工作和创新点如下：提出了一种实时的噪声环境下的语音端点检测算法，该方法融合了子带谱包络特征和子带长时信号方差特征进行判决，在各种噪声环境下区分语音段数据和非语音段数据；为了提升算法性能，所提方法只在反映共振峰特性的子带范围内对谱包络特征和长时信号方差特征进行分析；这种算法是一种低复杂度的无监督语音端点检测算法，不需要预训练模型。实验结果表明，这种方法在不同噪声环境下检测语音信号的性能优于不同基线方法，可以在实际语音通信系统中得到应用。提出了一种基于分析合成框架的单通道语音增强算法，应用一种改进的基于多带梳状滤波方法计算基音周期并判定各个子带的清浊度，降低噪声环境下基音周期的提取精度；引入深层神经网络模型增强线谱对参数，从而降低了谱参数增强的重构误差；将改进的基音周期估计方法和线谱对参数增强方法应用到基于分析合成框架的语音增强算法中，实验结果表明，这种基于分析合成框架的语音增强算法性能优于各种传统的语音增强方法，可以有效的去除各种音乐噪声。同时，将改进的基音周期估计方法和线谱对参数增强方法直接应用到参数化语音编码算法中，能够改善在噪声环境中经过低速率压缩的语音音质。提出了一种基于深度学习的单通道语音增强算法，应用深层神经网络模型建立带噪语音对数功率谱和安静语音对数功率谱之间的映射关系，利用深层神经网络模型的泛化能力，提高语音增强算法在噪声环境下的鲁棒性，从而改善了非平稳噪声环境下语音增强的音质；通过考虑相邻帧的特征进一步提升了模型的鲁棒性；通过引入有效的后处理方法进一步改善了算法的性能；此外，针对特定的噪声环境，对基于深层神经网络模型的环境自适应方法进行了尝试，使模型能够更好的适用于特定环境。实验结果表明，通过这种深层神经网络模型进行增强处理的语音音质优于传统的语音增强方法。提出了一种面向窄带通信信道的极低速率参数语音编码算法，在2.4kb...
英文摘要	In the narrowband speech communication system, one of the impotant problems is that making use of the limited bandwidth resource to enhance and transmit speech signal in complex environment in order to improving the speech quality. There are all kinds of noise in real environment; it is urgent to solve the problem which is detecting the speech signal accurately and separating the speech signal in noise background efficiently. Many of mainstream speech enhancement algorithms are difficult to estimate the speech signal accurately especially in non stationary noise environment and the enhanced speech cann’t meet the requirement. There is some limitation of bit resource and bandwidth in some special narrowband communication channels and the speech signal has to transmit at very low bit rate. Therefore, it is impotant to design the very low bit rate speech coding algorithm which could applid to wireless communication system and underwater acoustic communication system; however, the speech quality will be worse with the decreasing of speech coding rate. In addition, the spectra information of the high-band will lose in narrowband communication channel and it will lead to the decreasing of speech naturalness. It is significant to do some research on improving the speech quality of narrowband speech communication system in real environment and it is also a large chanllenge topic. To solve above mentioned problems existing in narrowband speech communication system, this thesis focus on researching deeply speech quality enhancement in narrowband speech communication system. The research content and innovation points are as follows. A robust speech activity detection algorithm is proposed in noise environment. The sub-band temporal envelope and the sub-band long-term signal variability are combined to distinguish the speech segment from the non speech segment in noise environment. To improve the performance, the sub-band which could reflect the formant characteristic is selected to extract both of features. This is a low complexity and unsupervised speech activity detection algorithm; there is no pre-training model. The experiment result shows that the proposed speech acitivity detection algorithm is prior to different baseline methods in different environment and it could be applied in speech communication system. A single channel speech enhancement algorithm based on analysis-synthesis framework is proposed. An improved pitch detection algorithm based on multi...
语种	中文
其他标识符	201118014628047
内容类型	学位论文
源URL	[http://ir.ia.ac.cn/handle/173211/6693]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	刘斌. 面向窄带通信信道的语音质量增强问题研究[D]. 中国科学院自动化研究所. 中国科学院大学. 2015.