a locality-based performance model for load-and-compute style computation

ISCAS OpenIR

	a locality-based performance model for load-and-compute style computation
	Yuan Liang; Zhang Yunquan
	2012
会议名称	IEEE International Conference on Cluster Computing
会议录名称	Proceedings - 2012 IEEE International Conference on Cluster Computing, CLUSTER 2012
页码	566-571
会议日期	SEP 24-28, 2012
会议地点	Beijing, PEOPLES R CHINA
收录类别	ISTP ; EI
ISSN	1552-5244
部门归属	Yuan Liang; Zhang Yunquan Chinese Acad Sci Lab Parallel Software & Computat Sci Inst Software Beijing 100864 Peoples R China.
摘要	The increasing speed gap between the processor and memory is usually the critical bottleneck in achieving high performance. Hardware caches, programming models, algorithms and data structures have been introduced and proposed to exploit localities on reducing the memory overhead. Some of these new designs share a common load and compute style in which the algorithm first moves all needed data to cache and then performs operations only on the ready data. In this paper, we introduce a locality function to model the reuse ability of an algorithm and propose a corresponding performance model. Then we theoretically analyze how to utilize and design on cache under our model: (1) We present theorems to give the optimal cache partition scheme for the software buffering technique targeting at hiding the memory overhead. (2) We provide methods to decide the optimal multicore design to maximally leverage benefits of both the shared and private caches. (3) We incorporate the memory overhead into the Amdahl's Law to study the speedup limitation on memory bandwidth.; The increasing speed gap between the processor and memory is usually the critical bottleneck in achieving high performance. Hardware caches, programming models, algorithms and data structures have been introduced and proposed to exploit localities on reducing the memory overhead. Some of these new designs share a common load and compute style in which the algorithm first moves all needed data to cache and then performs operations only on the ready data. In this paper, we introduce a locality function to model the reuse ability of an algorithm and propose a corresponding performance model. Then we theoretically analyze how to utilize and design on cache under our model: (1) We present theorems to give the optimal cache partition scheme for the software buffering technique targeting at hiding the memory overhead. (2) We provide methods to decide the optimal multicore design to maximally leverage benefits of both the shared and private caches. (3) We incorporate the memory overhead into the Amdahl's Law to study the speedup limitation on memory bandwidth.
关键词	Locality Function Cache Partition Private Cache Shared Cache
主办者	IEEE, IEEE Comp Soc, IEEE Tech Comm Scalable Comp (TCSC), Sugon, Intel, Inspur, VMware, Mellanox, PARATERA, BLSC, LoongStore, Nvidia
学科领域	Computer Science
语种	英语
内容类型	会议论文
URI标识	http://ir.iscas.ac.cn/handle/311060/15803
专题	中国科学院软件研究所
推荐引用方式 GB/T 7714	Yuan Liang,Zhang Yunquan. a locality-based performance model for load-and-compute style computation[C],2012:566-571.