ISCAS OpenIR
Optimizing and scaling HPCG on tianhe-2: Early experience
Zhang, Xianyi (1); Yang, Chao (1); Liu, Fangfang (1); Liu, Yiqun (1); Lu, Yutong (4)
2014
Conference Name14th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2014
Pages28-41
Conference DateAugust 24, 2014 - August 27, 2014
Conference PlaceDalian, China
Indexed TypeEI
Publish PlaceSpringer Verlag
ISSN3029743
ISBN9783319111964
Department(1) Institute of Software, Chinese Academy of Sciences, Beijing 100190, China; (2) State Key Laboratory of Computer Science, Chinese Academy of Sciences, Beijing 100190, China; (3) University of Chinese Academy of Sciences, Beijing 100049, China; (4) National University of Defense Technology, Changsha Hunan 410073, China
English AbstractIn this paper, a first attempt has been made on optimizing and scaling HPCG on the world's largest supercomputer, Tianhe-2. This early work focuses on the optimization of the CPU code without using the Intel Xeon Phi coprocessors. In our work, we reformulate the basic CG algorithm to minimize the cost of collective communication and employ several optimizing techniques such as SIMDization, loop unrolling, forward and backward sweep fusion, OpenMP parallization to further enhance the performance of kernels such as the sparse matrix vector multiplication, the symmetric Gauss-Seidel relaxation and the geometric multigrid v-cycle. We successfully scale the HPCG code from 256 up to 6,144 nodes (147,456 CPU cores) on Tianhe-2, with a nearly ideal weak scalability and an aggregate performance of 79.83 Tflops, which is 6.38X higher than the reference implementation. © 2014 Springer International Publishing Switzerland.; In this paper, a first attempt has been made on optimizing and scaling HPCG on the world's largest supercomputer, Tianhe-2. This early work focuses on the optimization of the CPU code without using the Intel Xeon Phi coprocessors. In our work, we reformulate the basic CG algorithm to minimize the cost of collective communication and employ several optimizing techniques such as SIMDization, loop unrolling, forward and backward sweep fusion, OpenMP parallization to further enhance the performance of kernels such as the sparse matrix vector multiplication, the symmetric Gauss-Seidel relaxation and the geometric multigrid v-cycle. We successfully scale the HPCG code from 256 up to 6,144 nodes (147,456 CPU cores) on Tianhe-2, with a nearly ideal weak scalability and an aggregate performance of 79.83 Tflops, which is 6.38X higher than the reference implementation. © 2014 Springer International Publishing Switzerland.
Language英语
Content Type会议论文
URIhttp://ir.iscas.ac.cn/handle/311060/16618
Collection中国科学院软件研究所
Recommended Citation
GB/T 7714
Zhang, Xianyi ,Yang, Chao ,Liu, Fangfang ,et al. Optimizing and scaling HPCG on tianhe-2: Early experience[C]. Springer Verlag,2014:28-41.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zhang, Xianyi (1)]'s Articles
[Yang, Chao (1)]'s Articles
[Liu, Fangfang (1)]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhang, Xianyi (1)]'s Articles
[Yang, Chao (1)]'s Articles
[Liu, Fangfang (1)]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhang, Xianyi (1)]'s Articles
[Yang, Chao (1)]'s Articles
[Liu, Fangfang (1)]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.