中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 会议论文
Title:
Optimizing and scaling HPCG on tianhe-2: Early experience
Author: Zhang, Xianyi (1) ; Yang, Chao (1) ; Liu, Fangfang (1) ; Liu, Yiqun (1) ; Lu, Yutong (4)
Conference Name: 14th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2014
Conference Date: August 24, 2014 - August 27, 2014
Issued Date: 2014
Conference Place: Dalian, China
Publish Place: Springer Verlag
Indexed Type: EI
ISSN: 3029743
ISBN: 9783319111964
Department: (1) Institute of Software, Chinese Academy of Sciences, Beijing 100190, China; (2) State Key Laboratory of Computer Science, Chinese Academy of Sciences, Beijing 100190, China; (3) University of Chinese Academy of Sciences, Beijing 100049, China; (4) National University of Defense Technology, Changsha Hunan 410073, China
Abstract: In this paper, a first attempt has been made on optimizing and scaling HPCG on the world's largest supercomputer, Tianhe-2. This early work focuses on the optimization of the CPU code without using the Intel Xeon Phi coprocessors. In our work, we reformulate the basic CG algorithm to minimize the cost of collective communication and employ several optimizing techniques such as SIMDization, loop unrolling, forward and backward sweep fusion, OpenMP parallization to further enhance the performance of kernels such as the sparse matrix vector multiplication, the symmetric Gauss-Seidel relaxation and the geometric multigrid v-cycle. We successfully scale the HPCG code from 256 up to 6,144 nodes (147,456 CPU cores) on Tianhe-2, with a nearly ideal weak scalability and an aggregate performance of 79.83 Tflops, which is 6.38X higher than the reference implementation. © 2014 Springer International Publishing Switzerland.
English Abstract: In this paper, a first attempt has been made on optimizing and scaling HPCG on the world's largest supercomputer, Tianhe-2. This early work focuses on the optimization of the CPU code without using the Intel Xeon Phi coprocessors. In our work, we reformulate the basic CG algorithm to minimize the cost of collective communication and employ several optimizing techniques such as SIMDization, loop unrolling, forward and backward sweep fusion, OpenMP parallization to further enhance the performance of kernels such as the sparse matrix vector multiplication, the symmetric Gauss-Seidel relaxation and the geometric multigrid v-cycle. We successfully scale the HPCG code from 256 up to 6,144 nodes (147,456 CPU cores) on Tianhe-2, with a nearly ideal weak scalability and an aggregate performance of 79.83 Tflops, which is 6.38X higher than the reference implementation. © 2014 Springer International Publishing Switzerland.
Language: 英语
Content Type: 会议论文
URI: http://ir.iscas.ac.cn/handle/311060/16618
Appears in Collections:软件所图书馆_会议论文

Files in This Item:

There are no files associated with this item.


Recommended Citation:
Zhang, Xianyi ,Yang, Chao ,Liu, Fangfang ,et al. Optimizing and scaling HPCG on tianhe-2: Early experience[C]. 见:14th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2014. Dalian, China. August 24, 2014 - August 27, 2014.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[Zhang, Xianyi (1)]'s Articles
[Yang, Chao (1)]'s Articles
[Liu, Fangfang (1)]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[Zhang, Xianyi (1)]‘s Articles
[Yang, Chao (1)]‘s Articles
[Liu, Fangfang (1)]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2019  中国科学院软件研究所 - Feedback
Powered by CSpace