中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
基于匹配跟踪的低位率语音编码研究
作者: 张文耀
答辩日期: 2002
专业: 计算机应用技术
授予单位: 中国科学院软件研究所
授予地点: 中国科学院软件研究所
学位: 博士
关键词: 匹配跟踪 ; 语音编码 ; 语音增强 ; 正弦建模 ; 基音估计 ; 心理声学模型
其他题名: Study on Matching Pursuit Based Low-bit Rate Speech Coding
摘要: 该文正是沿着正弦建模正弦分析的方向,采用匹配跟踪技术,结合心理声学模型,研究了新的建模方法及模型参数的量化编码,对低位率语音编码及相关问题进行了有益的探索,并取得了如下创新性研究成果:1.运用匹配跟踪技术处理了语音信号增强问题,给出了匹配跟踪信号增强过程中相干比阈值的确定方法.2.研究了基于匹配跟踪的正弦建模问题,提出了动态掩蔽阈值、感知梯度等概念,以及感知梯度正弦建模算法.3.针对正弦模型参数的量化编码,提出了幅度参数矢量量化、频率参数差分量化等方法,并探讨了频率盒量化模型以及随机相位和零相位模型等.4.围绕编码位率的降低和语音质量的提高,以逐步求精层层递进的方式研究了一系列压缩编码方案,并最终提出一个位率在1.5~2.4kpbs的综合编码方案.针对各种不同建模方法和参数量化技术,该文探讨了基于普通匹配跟踪正弦建模的压缩编码、感知梯度正弦建模压缩编码、基于动态字典匹配跟踪的压缩编码、分类动态字典压缩编码,以及结合感知梯度正弦建模和分类动态字典的综合编码方案.5.提出了CAMDF函数,以及基于CAMDF的语音分类与基音估计算法,并在该文的压缩编码方案中得以运用.
英文摘要: The speech coding technology has achieved high quality of reconstructed speech at high-bit rate and medium-bit rate. For low-bit rate and even very-low-bit rate, however, to achieve high speech quality is still a challenge problem that has important significance in theory and potential application value in practice. This makes lots of researchers explore new methods and techniques for the goal, such as techniques for sinusoidal modeling and methods for parameter quantization, and so on. Following the direction of sinusoidal modeling and sinusoidal analysis, this thesis adopted the matching pursuit techniques along with the psychoacoustic model, explored some novel methods for sinusoidal modeling as well as the quantization of model parameters, and discussed the low bit rate speech coding and its related problems. The major contributions of this thesis are included in the following: The matching pursuit techniques are applied to enhance speech signal, and a method to determine the threshold of coherent ratio is provided in the enhancement procedure based on matching pursuit. With the method, the noisy signal can be efficiently enhanced in a rather wide range while the statistical property of signal and noise is unknown. The sinusoidal modeling based on matching pursuit is studied in this thesis, and the concepts of dynamic masking threshold and perceptual gradient are proposed as well as the algorithm of sinusoidal modeling with perceptual gradient. The newly proposed method makes good use of the psychoacoustic model. And the perceptual information contained in the synthesized signal is increased in a furthest way during the modeling procedure. Therefore the efficiency of modeling is improved. The quality of the synthesized speech by this approach is rather high even though the model precision is low. In order to encode the parameters of sinusoidal model, the vector quantization techniques for amplitude parameters and the differential quantization for frequency parameters are proposed and discussed. At the same time, the frequency bin model, the random phase model and the zero phase model are also discussed. All of these reduce efficiently the coding bit rate. 4. Aimed at the reduction of bit rate and the improvement of speech quality, a serial of speech coding schemes are studied in a gradual refinement way, and an integrated coding scheme at 1.5~2.4kbps is presented finally. With different modeling methods and quantization techniques, the speech compression schemes discussed in this thesis include: the compression based on general matching pursuit sinusoidal modeling, the compression based on sinusoidal modeling with perceptual gradient, the compression based on dynamic dictionary matching pursuit, the compression scheme using classified dynamic dictionaries, and the integrated compression scheme that combines the sinusoidal modeling with perceptual gradient and the classified dynamic dictionaries. From these schemes it can be seen that matching pursuit based sinusoidal modeling has great potential in low bit rate speech coding, and provides a new way to study this problem. The finally proposed compression scheme takes more psycho acoustic effects into consideration, and takes the advantage of classified process, dynamic dictionary and sinusoidal modeling with perceptual gradient. Both of its bit rate and speech quality are superior to some existing international coding schemes and standards. 5. A function named CAMDF is proposed as well as the CAMDF-based algorithms for speech classification and pitch estimation. The algorithms are used for the coding schemes in this thesis. Because the CAMDF conquers the defect of traditional AMDF, the new pitch detection algorithm not only efficiently decreases the estimation errors, but also simplifies the detection process and improves the precision of estimated value. Speech classification using CAMDF also obtains satisfying results. Finally, the key points of the thesis are summarized, some improvements to be done in the current research are analyzed, and some suggestions and expectations for future work are provided.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/6196
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
LW011195.pdf(2919KB)----限制开放-- 联系获取全文

Recommended Citation:
张文耀. 基于匹配跟踪的低位率语音编码研究[D]. 中国科学院软件研究所. 中国科学院软件研究所. 2002-01-01.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[张文耀]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[张文耀]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace