中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件工程技术研究开发中心  > 学位论文
题名:
基于HDFS的数据交换共享平台的设计与实现
作者: 罗后启
答辩日期: 2011-06-01
导师: 范国闯
专业: 计算机软件与理论
授予单位: 中国科学院研究生院
授予地点: 北京
学位: 硕士
关键词: 计算机应用::计算机信息管理系统
摘要: 在企业、政府中存在大量不同时期、由不同厂商、在不同平台上建设而成的系统,由于缺少统一规划和标准,彼此之间很难实现信息共享,形成了大量孤岛式的业务应用系统。因此,如何在各个信息系统之间建立统一、规范的接口,实现对分布、独立、异构数据的交换和共享,已经成为新型信息化应用的主要工作重点。 数据交换共享平台的产生正是针对上述需求,它基于统一的中间件平台,通过提供前置节点代理部署在应用系统上实现数据抽取、转换,并将数据传输到数据共享中心,由数据共享中心对分散的数据进行统一存储、管理、分发。数据交换共享平台在应用中主要呈现出星形结构的部署方式和交换数据类型多样化的特点,由于有众多的节点要和数据共享中心进行大量的数据交换,这给其在数据吞吐量和可靠性方面带来了巨大挑战。为满足数据交换共享平台大数据量存储和多连接并发数据传输的需求,本文提出了一个基于HDFS的架构。在该架构中,数据交换过程被分解成元数据交换和数据文件交换两个过程,通过将数据交换请求分流到集群中的各个存储服务器上,实现数据文件的分布式、可靠存储。同时,针对数据交换共享平台的应用场景,本文还使用了基于数据访问热度的动态数据副本管理技术,动态调整热点数据的副本数,减少热点数据交换的消耗时间;面向小文件的索引优化机制,提高小文件交换效率;数据交换故障恢复机制,使得数据交换的可靠性和效率得到提高。最后,论文给出了HDFS数据交换共享平台的设计与实现,并进行了相关实验验证了该系统的实际性能。
英文摘要:

In the enterprise, there are a lot of information systems that built by various vendors on heterogeneous platforms. Due to the lack of unified planning and standards, data sharing and exchanging between them is really a troublesome work. Therefore, how to establish uniform, standard interface to archive data exchange and sharing within distribution, independent, heterogeneous systems is becoming a hot issue.

Data exchange and sharing platform is a solution for this problem, it is based on the unified middleware platform, realizing heterogeneous data exchanging and sharing by providing client API and front lead switching nodes deployed in the application system. In a majority of cases, data exchange platform is star topology deployed and the data type varied acutely from the different application domain. In this scenarios, a large number of nodes need to exchange huge volume data with central node directly or indirectly, this pose a significant challenge to the central node’s throughput and reliability.

To improve the throughput and reliability, this paper proposes a HDFS based data exchange and sharing architecture. In this model, data exchange process is broken down into the meta-data exchange and data exchange two processes, this mechanism can broke data exchange requests to the storage cluster which provide distributed and reliable data storage. Meanwhile, to acclimatize the application scenarios, this paper propose a heat based replica management model, which dynamically adjust the number of hot replica to reduce hot data exchange time; a small file index optimization to improve the efficiency of small file storage; a reliable data exchange mechanism that handle the client and server-side failure.

The design and implementation of the HDFS data exchange and sharing platform is given at the end of this thesis, and we carry out an experiment to verify the system's actual performance.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/10228
Appears in Collections:软件工程技术研究开发中心 _学位论文

Files in This Item:
File Name/ File Size Content Type Version Access License
基于HDFS的数据交换共享平台的设计与实现V2-1.pdf(1705KB)----限制开放 联系获取全文

Recommended Citation:
罗后启. 基于HDFS的数据交换共享平台的设计与实现[D]. 北京. 中国科学院研究生院. 2011-06-01.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[罗后启]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[罗后启]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace