Institutional Repository
| 623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores | |
| Liu, YQ; Yang, C; Liu, FF; Zhang, XY; Lu, YT; Du, YF; Yang, CQ; Xie, M; Liao, XK | |
| 2016 | |
| 发表期刊 | INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS
![]() |
| ISSN | 1094-3420 |
| 卷号 | 30期号:1页码:39-54 |
| 摘要 | In this article, we present a new hybrid algorithm to enable and scale the high-performance conjugate gradients (HPCG) benchmark on large-scale heterogeneous systems such as the Tianhe-2. Based on an inner-outer subdomain partitioning strategy, the data distribution between host and device can be balanced adaptively. The overhead of data movement from both the MPI communication and the PCI-E transfer can be significantly reduced by carefully rearranging and fusing operations. A variety of parallelization and optimization techniques for performance-critical kernels are exploited and analyzed to maximize the performance gain on both host and device. We carry out experiments on both a small heterogeneous computer and the world's largest one, the Tianhe-2. On the small system, a thorough comparison and analysis has been presented to select from different optimization choices. On Tianhe-2, the optimized implementation scales to the full-system level of 3.12 million heterogeneous cores, with an aggregated performance of 623 Tflop/s and a parallel efficiency of 81.2%.; In this article, we present a new hybrid algorithm to enable and scale the high-performance conjugate gradients (HPCG) benchmark on large-scale heterogeneous systems such as the Tianhe-2. Based on an inner-outer subdomain partitioning strategy, the data distribution between host and device can be balanced adaptively. The overhead of data movement from both the MPI communication and the PCI-E transfer can be significantly reduced by carefully rearranging and fusing operations. A variety of parallelization and optimization techniques for performance-critical kernels are exploited and analyzed to maximize the performance gain on both host and device. We carry out experiments on both a small heterogeneous computer and the world's largest one, the Tianhe-2. On the small system, a thorough comparison and analysis has been presented to select from different optimization choices. On Tianhe-2, the optimized implementation scales to the full-system level of 3.12 million heterogeneous cores, with an aggregated performance of 623 Tflop/s and a parallel efficiency of 81.2%. |
| 收录类别 | SCI |
| 关键词 | Tianhe-2 Hpcg Conjugate Gradients Mic Heterogeneous Computing |
| 部门归属 | Chinese Acad Sci, Inst Software, Beijing, Peoples R China. Univ Chinese Acad Sci, Beijing, Peoples R China. Chinese Acad Sci, State Key Lab Comp Sci, Beijing, Peoples R China. Natl Univ Def Technol, Dept Comp Sci & Technol, Changsha, Hunan, Peoples R China. |
| 语种 | 英语 |
| WOS记录号 | WOS:000371326000004 |
| 引用统计 | |
| 内容类型 | 期刊论文 |
| URI标识 | http://ir.iscas.ac.cn/handle/311060/17346 |
| 专题 | 中国科学院软件研究所 |
| 推荐引用方式 GB/T 7714 | Liu, YQ,Yang, C,Liu, FF,et al. 623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS,2016,30(1):39-54. |
| APA | Liu, YQ.,Yang, C.,Liu, FF.,Zhang, XY.,Lu, YT.,...&Liao, XK.(2016).623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores.INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS,30(1),39-54. |
| MLA | Liu, YQ,et al."623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores".INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 30.1(2016):39-54. |
| 条目包含的文件 | 条目无相关文件。 | |||||
| 个性服务 |
| 推荐该条目 |
| 保存到收藏夹 |
| 查看访问统计 |
| 导出为Endnote文件 |
| 谷歌学术 |
| 谷歌学术中相似的文章 |
| [Liu, YQ]的文章 |
| [Yang, C]的文章 |
| [Liu, FF]的文章 |
| 百度学术 |
| 百度学术中相似的文章 |
| [Liu, YQ]的文章 |
| [Yang, C]的文章 |
| [Liu, FF]的文章 |
| 必应学术 |
| 必应学术中相似的文章 |
| [Liu, YQ]的文章 |
| [Yang, C]的文章 |
| [Liu, FF]的文章 |
| 相关权益政策 |
| 暂无数据 |
| 收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论