中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 期刊论文
Title:
623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores
Author: Liu, YQ ; Yang, C ; Liu, FF ; Zhang, XY ; Lu, YT ; Du, YF ; Yang, CQ ; Xie, M ; Liao, XK
Keyword: Tianhe-2 ; HPCG ; conjugate gradients ; MIC ; heterogeneous computing
Source: INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS
Issued Date: 2016
Volume: 30, Issue:1, Pages:39-54
Indexed Type: SCI
Department: Chinese Acad Sci, Inst Software, Beijing, Peoples R China. Univ Chinese Acad Sci, Beijing, Peoples R China. Chinese Acad Sci, State Key Lab Comp Sci, Beijing, Peoples R China. Natl Univ Def Technol, Dept Comp Sci & Technol, Changsha, Hunan, Peoples R China.
Abstract: In this article, we present a new hybrid algorithm to enable and scale the high-performance conjugate gradients (HPCG) benchmark on large-scale heterogeneous systems such as the Tianhe-2. Based on an inner-outer subdomain partitioning strategy, the data distribution between host and device can be balanced adaptively. The overhead of data movement from both the MPI communication and the PCI-E transfer can be significantly reduced by carefully rearranging and fusing operations. A variety of parallelization and optimization techniques for performance-critical kernels are exploited and analyzed to maximize the performance gain on both host and device. We carry out experiments on both a small heterogeneous computer and the world's largest one, the Tianhe-2. On the small system, a thorough comparison and analysis has been presented to select from different optimization choices. On Tianhe-2, the optimized implementation scales to the full-system level of 3.12 million heterogeneous cores, with an aggregated performance of 623 Tflop/s and a parallel efficiency of 81.2%.
English Abstract: In this article, we present a new hybrid algorithm to enable and scale the high-performance conjugate gradients (HPCG) benchmark on large-scale heterogeneous systems such as the Tianhe-2. Based on an inner-outer subdomain partitioning strategy, the data distribution between host and device can be balanced adaptively. The overhead of data movement from both the MPI communication and the PCI-E transfer can be significantly reduced by carefully rearranging and fusing operations. A variety of parallelization and optimization techniques for performance-critical kernels are exploited and analyzed to maximize the performance gain on both host and device. We carry out experiments on both a small heterogeneous computer and the world's largest one, the Tianhe-2. On the small system, a thorough comparison and analysis has been presented to select from different optimization choices. On Tianhe-2, the optimized implementation scales to the full-system level of 3.12 million heterogeneous cores, with an aggregated performance of 623 Tflop/s and a parallel efficiency of 81.2%.
Language: 英语
WOS ID: WOS:000371326000004
Citation statistics:
Content Type: 期刊论文
URI: http://ir.iscas.ac.cn/handle/311060/17346
Appears in Collections:软件所图书馆_期刊论文

Files in This Item:

There are no files associated with this item.


Recommended Citation:
Liu, YQ,Yang, C,Liu, FF,et al. 623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS,2016-01-01,30(1):39-54.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[Liu, YQ]'s Articles
[Yang, C]'s Articles
[Liu, FF]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[Liu, YQ]‘s Articles
[Yang, C]‘s Articles
[Liu, FF]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2019  中国科学院软件研究所 - Feedback
Powered by CSpace