ISCAS OpenIR
AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs
Wang, Qian (1); Zhang, Xianyi (1); Zhang, Yunquan (2); Yi, Qing (3)
2013
会议名称2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013
会议日期November 17, 2013 - November 22, 2013
会议地点Denver, CO, United states
收录类别EI
出版地IEEE Computer Society
ISSN21674329
ISBN9781450323789
部门归属(1) Institute of Software, Chinese Academy of Sciences, University of Chinese, Beijing, China; (2) Institute of Software, Chinese Academy of Sciences, State Key Lab of Computer Architecture, Beijing, China; (3) University of Colorado at Colorado Springs, Colorado, United States
摘要Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameterized code templates to formulate a number of commonly occurring instruction sequences within the optimized low-level C code of these DLA kernels. Then, our framework uses a specialized low-level C optimizer to identify instruction sequences that match the pre-defined code templates and thereby translates them into extremely efficient SSE/AVX instructions. The DLA kernels generated by our templatebased approach surpass the implementations of Intel MKL and AMD ACML BLAS libraries, on both Intel Sandy Bridge and AMD Piledriver processors. Copyright 2013 ACM.; Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameterized code templates to formulate a number of commonly occurring instruction sequences within the optimized low-level C code of these DLA kernels. Then, our framework uses a specialized low-level C optimizer to identify instruction sequences that match the pre-defined code templates and thereby translates them into extremely efficient SSE/AVX instructions. The DLA kernels generated by our templatebased approach surpass the implementations of Intel MKL and AMD ACML BLAS libraries, on both Intel Sandy Bridge and AMD Piledriver processors. Copyright 2013 ACM.
语种英语
内容类型会议论文
URI标识http://ir.iscas.ac.cn/handle/311060/16662
专题中国科学院软件研究所
推荐引用方式
GB/T 7714
Wang, Qian ,Zhang, Xianyi ,Zhang, Yunquan ,et al. AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs[C]. IEEE Computer Society,2013.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Wang, Qian (1)]的文章
[Zhang, Xianyi (1)]的文章
[Zhang, Yunquan (2)]的文章
百度学术
百度学术中相似的文章
[Wang, Qian (1)]的文章
[Zhang, Xianyi (1)]的文章
[Zhang, Yunquan (2)]的文章
必应学术
必应学术中相似的文章
[Wang, Qian (1)]的文章
[Zhang, Xianyi (1)]的文章
[Zhang, Yunquan (2)]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。