ISCAS OpenIR
Optimizing and scaling HPCG on tianhe-2: Early experience
Zhang, Xianyi (1); Yang, Chao (1); Liu, Fangfang (1); Liu, Yiqun (1); Lu, Yutong (4)
2014
会议名称14th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2014
页码28-41
会议日期August 24, 2014 - August 27, 2014
会议地点Dalian, China
收录类别EI
出版地Springer Verlag
ISSN3029743
ISBN9783319111964
部门归属(1) Institute of Software, Chinese Academy of Sciences, Beijing 100190, China; (2) State Key Laboratory of Computer Science, Chinese Academy of Sciences, Beijing 100190, China; (3) University of Chinese Academy of Sciences, Beijing 100049, China; (4) National University of Defense Technology, Changsha Hunan 410073, China
摘要In this paper, a first attempt has been made on optimizing and scaling HPCG on the world's largest supercomputer, Tianhe-2. This early work focuses on the optimization of the CPU code without using the Intel Xeon Phi coprocessors. In our work, we reformulate the basic CG algorithm to minimize the cost of collective communication and employ several optimizing techniques such as SIMDization, loop unrolling, forward and backward sweep fusion, OpenMP parallization to further enhance the performance of kernels such as the sparse matrix vector multiplication, the symmetric Gauss-Seidel relaxation and the geometric multigrid v-cycle. We successfully scale the HPCG code from 256 up to 6,144 nodes (147,456 CPU cores) on Tianhe-2, with a nearly ideal weak scalability and an aggregate performance of 79.83 Tflops, which is 6.38X higher than the reference implementation. © 2014 Springer International Publishing Switzerland.; In this paper, a first attempt has been made on optimizing and scaling HPCG on the world's largest supercomputer, Tianhe-2. This early work focuses on the optimization of the CPU code without using the Intel Xeon Phi coprocessors. In our work, we reformulate the basic CG algorithm to minimize the cost of collective communication and employ several optimizing techniques such as SIMDization, loop unrolling, forward and backward sweep fusion, OpenMP parallization to further enhance the performance of kernels such as the sparse matrix vector multiplication, the symmetric Gauss-Seidel relaxation and the geometric multigrid v-cycle. We successfully scale the HPCG code from 256 up to 6,144 nodes (147,456 CPU cores) on Tianhe-2, with a nearly ideal weak scalability and an aggregate performance of 79.83 Tflops, which is 6.38X higher than the reference implementation. © 2014 Springer International Publishing Switzerland.
语种英语
内容类型会议论文
URI标识http://ir.iscas.ac.cn/handle/311060/16618
专题中国科学院软件研究所
推荐引用方式
GB/T 7714
Zhang, Xianyi ,Yang, Chao ,Liu, Fangfang ,et al. Optimizing and scaling HPCG on tianhe-2: Early experience[C]. Springer Verlag,2014:28-41.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhang, Xianyi (1)]的文章
[Yang, Chao (1)]的文章
[Liu, Fangfang (1)]的文章
百度学术
百度学术中相似的文章
[Zhang, Xianyi (1)]的文章
[Yang, Chao (1)]的文章
[Liu, Fangfang (1)]的文章
必应学术
必应学术中相似的文章
[Zhang, Xianyi (1)]的文章
[Yang, Chao (1)]的文章
[Liu, Fangfang (1)]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。