ISCAS OpenIR  > 中科院软件所  > 中科院软件所
Author: 柳锴
Issued Date: 2007-06-06
Degree Grantor: 中国科学院软件研究所
Place of Degree Grantor: 软件研究所
Degree Level: 博士
Keyword: 并行数据库 ; SMP机群 ; 实现模型 ; 数据划分 ; 并行操作算法
Alternative Title: Research on Implementation Technologies of Cluster Parallel Database System
Abstract: 各种高科技领域的强劲需求不断推动着数据库规模的迅速膨胀,传统的集中式数据库系统已经难以满足这种数据密集型应用的需要,因此,开发高性能、低成本的并行数据库系统将有重大意义。 本文主要探讨了在分布式存储的多机系统上设计和实现可扩展的并行关系数据库系统的技术。这一设计思路主要基于以下事实:第一,分布式存储的多机系统比单机系统和共享存储的多机系统具有更快的响应时间、更好的可扩展性、更高的性价比,应用非常广泛;第二,关系数据库管理系统(RDBMS)在当今数据库市场上仍然占据统治地位,关系代数作为关系查询的基础,被认为是最适合实现并行处理的;第三,关系数据库可以高效的执行任何用说明式查询语言表示的查询。 我们基于机群架构实现了一个并行查询原型系统ParaMidSQL (Parallel Middleware for SQL)。首先,我们描述了该原型系统的总体架构,然后介绍了全局字典的设计方法,最后详细讨论了并行算法库的设计和实现。在设计并行算法库时,考虑了两种不同的实现方法:其一是“基于数据划分”的方法,它采用“分而治之”的策略来实现并行查询,其二是“基于并行操作算法”的方法,它采用数据交换与查询操作重叠进行的策略来隐藏通信开销。前者实现简单,但是性能表现欠佳,后者性能优异,但是实现起来比较复杂。 在四结点SMP机群上对原型系统进行了性能测试。测试结果表明:ParaMidSQL对数据库查询的基本操作(暂不考虑更新操作)较串行MySQL有显著的加速,选择、排序、连接的平均加速比分别为2.62(3个结点)、3.41(4个结点)、2.93(4个结点)。
English Abstract: Nowadays, the increasingly growth in databases scale has coincided with intensive demands on numerous high-tech fields, the final result is that the traditional central database system is incapable of dealing with such highly data-intensive applications. So it will make much sense to carry out research on and develop high-performance parallel database system, especially those that has lower price. In this thesis, we mainly discuss the design and implementation of an extensible parallel relational database system on distributed memory multi-computer architectures. The approach is motivated by the following facts. Firstly, the distributed memory multi-computer architecture is now widespread because of its faster response time, better scalability and higher performance/cost than single processor and shared memory multiprocessor architectures. Secondly, relational database management systems (RDBMS) dominate the marketplace and the relational algebra upon which all RDBMS query processing is based, is ideally suited to parallel execution. Finally, relational databases can efficiently execute arbitrary queries written in a declarative query language. Based on a cluster architecture, we have implemented a prototyped parallel query system—ParaMidSQL (Parallel Middleware for SQL). Firstly, we describe the overall architecture of the prototype system; this is followed by a brief presentation of the design of the global dictionary; finally, a detailed discussion of the design and implementation of the parallel algorithms library is given. There are two methods taken into consideration when we design the library: one is based on data partitioning, which employs a “divide and conquer” strategy to perform the parallel query; the other is based on parallel operation algorithms, which hides the communication overhead by overlapping data exchange and query processing. The former can be easily fulfilled but has a poor performance, while the latter is difficult to develop but has a much better performance. Extensive experiments on a 4-node SMP cluster have been made to evaluate the performance of ParaMidSQL. The test results show that notable acceleration is achieved by the application of ParaMidSQL for basic operations of database query (update operations are left out of consideration in this thesis) in comparison with serial MySQL. The average speed-ups of select operation, sort operation and join operation are 2.62 for 3 nodes, 3.41 and 2.93 for 4 nodes respectively.
Language: 中文
Content Type: 学位论文
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200428015029064柳锴_paper.doc(1198KB)----限制开放-- 联系获取全文

Recommended Citation:
柳锴. 基于机群架构的并行数据库实现技术研究[D]. 软件研究所. 中国科学院软件研究所. 2007-06-06.
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[柳锴]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[柳锴]‘s Articles
Related Copyright Policies
Social Bookmarking
Add to CiteULike Add to Connotea Add to Add to Digg Add to Reddit
所有评论 (0)
内 容:
Email:  *
验证码:   刷新
标 题:
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.



Valid XHTML 1.0!
Copyright © 2007-2021  中国科学院软件研究所 - Feedback
Powered by CSpace