中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
Web信息分发技术及效率研究
作者: 黄涛
答辩日期: 2000
专业: 计算机应用技术
授予单位: 中国科学院软件研究所
授予地点: 中国科学院软件研究所
学位: 博士
关键词: Web信息资源 ; 推测机制 ; PUSH方法 ; 引用局域性
摘要: 自从1990年12月世界上第一个Web软件在Steven Job的NeXT计算机系统上诞生以来,Web技术及其应用在世界范围内以惊人的速度迅速扩展,现在已渗透到了工作生活的各个领域。面对如此众多的Web服务器与其上面丰富的Web信息资源,面对如此庞大的Internet用户群和相对有限的网络带宽资源,高效使用有限的网络带宽变得比以往更加重要,并且对Internet的应用前景具有决定性的影响。九十年代中后期以来,随着Internet的飞速发展以及在商业上的广泛应用,其效率和带宽的矛盾更为突出,WWW事实上已经成为当今国际标准的商业通信平台,因此,WWW上信息分发效率的研究迫在眉睫。本文结合所承担的国家“九五”重点科技攻关计划(96-743-01-01-05)专题“网络信息获取前后服务处理技术”和“金桥”工程项目的“Web信息获取系统”开发任务,采用可靠组播(Reliable Multicast Transport)和缓存(Cache)技术,对Internet的Web信息分发技术和效率问题进行了研究。论文的主要工作包括以下几个方面:1、分析了现有Web信息分发系统采用的技术及其不足。在工作中研究了使用组播传输技术来进行信息分发,并重点研究了缓存机制的效率:指出了目前存在着不能提供连续数据流可靠组播分发,以及客户端缓存统计功能欠缺,导致不能准确刻画用户使用Internet模式和Internet流量分布形态等不足。2、实现了适合信息发布的连续数据流可靠组播传输协议RMTP+。针对点对多点通信模式下实现连续数据流可靠组播的问题,对目前存在的组播传输协议在实现可靠组播方面的优缺点进行了分析,在RMTP协议(一种可靠组播协议)的基础上进行了改进和提高,将需要传输的连续数据流划分为数据块(每个数据块由一定数量的数据包组成),并以数据块为单位通过RMTP协议进行可靠的组播传输,从而实现整个数据流的可靠组播传输。在实现过程中,使用了否定的确认方式取代原有的肯定的确认方式,减少处理确认信息的负担。在构造组播树的过程中,按层次将组播组分为多个局部组,并在每个组中指定特定的接收者分级承担本组内确认包的处理,从而将发送者从沉重的确认包处理中解脱出来,提高了系统的性能:分析了RMTP+协议的理论性能,实现了该协议并在实际环境中测试和统计了吞吐量,同时也研究了理论值和试验数据之间的差异,解释了造成此种差异的原因。3、研究开发了客户化的WWW流量跟踪及测评技术。WWW流量的爆发使得精确了解WWW的使用情况变得甚为必要,特别是需要明了WWW用户对WWW文档的调用情况。为了解决这一问题,我们收集了客户调用WWW文档的跟踪信息,从而可以反映成千上万用户对WWW文档的请求情况。为了实现收集跟踪数据的目的,开发了在客户端对WWW流量进行跟踪的技术。通过对Netscape浏览器软件源代码进行修改并使之满足新的功能要求,捕获了大量的客户跟踪数据,记录了文档引用模式(Request Model)和用户实际访问WWW所花费的时间等关键参数信息。通过对收集到的数据进行分析,归纳了用户使用Internet的特征,包括文档大小的分布规律、文档流行度与其大小之间的关系、调用文档的用户请求的分布程度、文档引用次数与文档流行程度之间的关系。4、分析发现了WWW引用中的三种局域性,并提出了推测分发技术。通过对客户机访问WWW资源模式的分析,注意到了客户端发出的数据请求之间存在着三种不同的引用局域性(Reference Locality),即:时序局域性、相邻局域性、空间局域性。资料表明,以前在这方面的研究尚未全面充分的利用WWW通信本身具有的局域性特点,而仅利用了前两种引用局域性,通过研究发现,仅仅使用时序和相邻局域性还不足以使缓存的效率足够高。我们在利用前两种引用局域性的基础上,通过参考有关过去访问模式的知识,将空间局域性特点利用起来,从而将缓存系统的效率提高到更高的水平。5、分析了数据分布程度对分发效率的影响,重点考察了代理服务器在数据分发中的作用。信息所在的位置必然对信息的存取访问带来影响(如通信时间的长短、通信流量的多少等),在设计Web站点群时,如何使信息的分布情况更有利于远程客户对信息的访问,即提高信息的访问效率,是一个越来越重要的课题。本部分工作的主要目的在于尝试提供一种机制,使得“流行”数据将能够在分发过程中自动和动态的朝着方便信息用户访问的方向分布,最终希望达到的效果就是:越流行的数据,越靠近最终的用户。论文中提出了新的系统模型,并进行了效率分析,以期减少对服务器的重复访问,进而减少用户等待时间,减轻网络和服务器的负载。6、提出了归纳目录信息表示的增强ICP方案,实现缓存内容信息的共享。当前Web缓存共享技术的广泛使用由于ICP协议的代价和开销而变得困难。通过使用基于归纳表示的目录信息,提出了新的增强ICP协议,在Squid代理服务器中实现了原型系统并作了模拟对比性能实验。实验结果表明,新的ICP协议初步解决了在广域网范围内ICP协议的效率问题。7、基于可靠组播技术和PUSH技术,设计开发了一个WEB信息分发原型系统,其中使用连续数据流可靠组播模块可以实现组播传输文件和数据的目的,而PUSH分发原型系统可在Internet进行用户所需信息的登记,并使用PUSH信息分发方式将信息主动的传送到特定的用户。
英文摘要: Since December 1990, the birth of the first WWW software on the NeXT system, the technologies and applications of WWW have gone a tremendous increase in the world, and affected almost field of work and daily life. With so many WWW servers and so much Web information on Internet, so huge number WWW users utilize relatively inadequate network bandwidth, it becomes more and more import to efficiently use limited network bandwidth, and also becomes deterministic to the application forgroud of Internet efficiency. After entering 90's, accompanying with the rapid growth of Internet and widely commercial utilization, WWW has become de facto international standard communication platform and its problems with efficiency and bandwidth also become more and more obvious. As a result, it's urgent for us to deep research the information distribution efficiency of WWW. This paper, with the background of taking the task of national 95~(th) key project (the pre- and post-processing technology for Web information search) and the task of developing the Web information search system for the "Golden Bridge" project, presents the research of Web information dissemination technology and its efficiency by using reliable multicast transport technology and cache technology. The main contributions of this paper involve following aspects: 1. Analyze the technologies now adopted in nowadays Web information dissemination system and their drawback. From the technical view, the current in-progress research mainly involves multicast transportation technology and cache technology, especially with server-side cache technologies, but these researches also exist the defect of lacking accurate description technology of www throughput, not enough research with client-side cache, and reliable multicast transportation technology. 2. Develop the RMTP+ protocol suitable for reliably disseminate the continuous data stream. This paper focuses on the existing problems of implementing reliable multicasting of continuous data stream and analyzes the virtues and drawbacks of existing multicast protocols. Based on RMTP protocol, we develop and improve the new protocol, RMTP+, to achieve the target of reliably multicasting. During the process of reliable multicasting, the negative acknowledgment is used to replace the positive acknowledgment to eliminate the burden of processing acknowledgement information. When constructing multicast tree, we divide the multicast group into several local groups and pick out the designated receiver to hierarchically process the status information of receivers in their local group. This method can alleviate the huge burden of Web information sender and can improve the performance of system. In this paper we also analyze the theoretical performance of RMTP+ protocol, implement this protocol and carry out several kinds of experiments and gather the experimental data in real situation. Beside this, we also research the difference between the theoretical value and real data, and try to explain the reason resulting in this difference. 3. Research and develop the customized WWW throughput trace and estimation technology. Based on the source code of netscape 3.0 browser, we captured a lot of customer trace data and record the key parameters such as user request model and time used to access Web information. Through the analysis of data gathered, we analyze the characteristic of user's using model of internet, web documents sizes distribution law, and so on. 4. Discover three types of reference locality through analysis and populate the speculative service to disseminate the web information. We found only using the former 2 reference localities of total 3 only can bring out limited benefit, so we populate the new service utilizing the third reference locality to get maximum cache performance. 5. Analyze the effect of data distribution degree to dissemination efficiency, especially the effect of proxy server during the process of data disseminating. Our target is to provide a mechanism that can let more popular data more close to end users, so the cost of disseminating same data is relatively lower than other situations. The new system model is also discussed in this paper. At last we analyze its efficiency with willing to eliminate the repeated access of server, waiting time of users and payload of servers and networks. 6. Bring forth the new enhanced ICP approach by using summarized directory information representation to reach the target of sharing between cache contents. We implemented the new prototype system based on the SQUID system and carry out a series of comparing testing. The experiment result shows the new approach greatly solve the efficiency problem when deploying ICP protocol over wide-area network. 7. Design and implement a Web information dissemination prototype system based on reliable multicast and push technologies. The reliable multicast transportation module can carry out the tasks such as reliably transferring files and data. The information dissemination module can let user readily register the information they want and then transfer all these information and data to the proper destination by using PUSH technologies.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/6074
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
LW002978.pdf(2076KB)----限制开放-- 联系获取全文

Recommended Citation:
黄涛. Web信息分发技术及效率研究[D]. 中国科学院软件研究所. 中国科学院软件研究所. 2000-01-01.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[黄涛]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[黄涛]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace