ISCAS OpenIR
Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs
Li, Chao (1); Yang, Yi (2); Dai, Hongwen (1); Yan, Shengen (3); Mueller, Frank (4); Zhou, Huiyang (1)
2014
会议名称2014 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2014
页码231-242
会议日期March 23, 2014 - March 25, 2014
会议地点Monterey, CA, United states
收录类别EI
出版地IEEE Computer Society
ISBN9781479936052
部门归属(1) Department of Electrical and Computer Engineering, North Carolina State University, United States; (2) Department of Computing Systems Architecture, NEC Laboratories America, United States; (3) Institute of Software, Chinese Academy of Sciences, North Carolina State University, United States; (4) Department of Computer Science, North Carolina State University, United States
摘要On-chip caches are commonly used in computer systems to hide long off-chip memory access latencies. To manage on-chip caches, either software-managed or hardware-managed schemes can be employed. State-of-art accelerators, such as the NVIDIA Fermi or Kepler GPUs and Intel's forthcoming MIC 'Knights Landing' (KNL), support both software-managed caches, aka. shared memory (GPUs) or near memory (KNL), and hardware-managed L1 data caches (D-caches). Furthermore, shared memory and the L1 D-cache on a GPU utilize the same physical storage and their capacity can be configured at runtime (same for KNL). In this paper, we present an in-depth study to reveal interesting and sometimes unexpected tradeoffs between shared memory and the hardware-managed L1 D- caches in GPU architecture. In our study, the kernels utilizing the L1 D-caches are generated from those leveraging shared memory to ensure that the same optimizations such as tiling are applied equally in both versions. Our detailed analyses reveal that rather than cache hit rates, the following tradeoffs often have more profound performance impacts. On one hand, the kernels utilizing the L1 caches may support higher degrees of thread-level parallelism, offer more opportunities for data to be allocated in registers, and sometimes result in lower dynamic instruction counts. On the other hand, the applications utilizing shared memory enable more coalesced accesses and tend to achieve higher degrees of memory-level parallelism. Overall, our results show that most benchmarks perform significantly better with shared memory than the L1 D-caches due to the high impact of memory-level parallelism and memory coalescing. © 2014 IEEE.; On-chip caches are commonly used in computer systems to hide long off-chip memory access latencies. To manage on-chip caches, either software-managed or hardware-managed schemes can be employed. State-of-art accelerators, such as the NVIDIA Fermi or Kepler GPUs and Intel's forthcoming MIC 'Knights Landing' (KNL), support both software-managed caches, aka. shared memory (GPUs) or near memory (KNL), and hardware-managed L1 data caches (D-caches). Furthermore, shared memory and the L1 D-cache on a GPU utilize the same physical storage and their capacity can be configured at runtime (same for KNL). In this paper, we present an in-depth study to reveal interesting and sometimes unexpected tradeoffs between shared memory and the hardware-managed L1 D- caches in GPU architecture. In our study, the kernels utilizing the L1 D-caches are generated from those leveraging shared memory to ensure that the same optimizations such as tiling are applied equally in both versions. Our detailed analyses reveal that rather than cache hit rates, the following tradeoffs often have more profound performance impacts. On one hand, the kernels utilizing the L1 caches may support higher degrees of thread-level parallelism, offer more opportunities for data to be allocated in registers, and sometimes result in lower dynamic instruction counts. On the other hand, the applications utilizing shared memory enable more coalesced accesses and tend to achieve higher degrees of memory-level parallelism. Overall, our results show that most benchmarks perform significantly better with shared memory than the L1 D-caches due to the high impact of memory-level parallelism and memory coalescing. © 2014 IEEE.
语种英语
内容类型会议论文
URI标识http://ir.iscas.ac.cn/handle/311060/16614
专题中国科学院软件研究所
推荐引用方式
GB/T 7714
Li, Chao ,Yang, Yi ,Dai, Hongwen ,et al. Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs[C]. IEEE Computer Society,2014:231-242.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Li, Chao (1)]的文章
[Yang, Yi (2)]的文章
[Dai, Hongwen (1)]的文章
百度学术
百度学术中相似的文章
[Li, Chao (1)]的文章
[Yang, Yi (2)]的文章
[Dai, Hongwen (1)]的文章
必应学术
必应学术中相似的文章
[Li, Chao (1)]的文章
[Yang, Yi (2)]的文章
[Dai, Hongwen (1)]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。