Title: | AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs |
Author: | Wang, Qian (1)
; Zhang, Xianyi (1)
; Zhang, Yunquan (2)
; Yi, Qing (3)
|
Conference Name: | 2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013
|
Conference Date: | November 17, 2013 - November 22, 2013
|
Issued Date: | 2013
|
Conference Place: | Denver, CO, United states
|
Publish Place: | IEEE Computer Society
|
Indexed Type: | EI
|
ISSN: | 21674329
|
ISBN: | 9781450323789
|
Department: | (1) Institute of Software, Chinese Academy of Sciences, University of Chinese, Beijing, China; (2) Institute of Software, Chinese Academy of Sciences, State Key Lab of Computer Architecture, Beijing, China; (3) University of Colorado at Colorado Springs, Colorado, United States
|
Abstract: | Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameterized code templates to formulate a number of commonly occurring instruction sequences within the optimized low-level C code of these DLA kernels. Then, our framework uses a specialized low-level C optimizer to identify instruction sequences that match the pre-defined code templates and thereby translates them into extremely efficient SSE/AVX instructions. The DLA kernels generated by our templatebased approach surpass the implementations of Intel MKL and AMD ACML BLAS libraries, on both Intel Sandy Bridge and AMD Piledriver processors. Copyright 2013 ACM. |
English Abstract: | Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameterized code templates to formulate a number of commonly occurring instruction sequences within the optimized low-level C code of these DLA kernels. Then, our framework uses a specialized low-level C optimizer to identify instruction sequences that match the pre-defined code templates and thereby translates them into extremely efficient SSE/AVX instructions. The DLA kernels generated by our templatebased approach surpass the implementations of Intel MKL and AMD ACML BLAS libraries, on both Intel Sandy Bridge and AMD Piledriver processors. Copyright 2013 ACM. |
Language: | 英语
|
Content Type: | 会议论文
|
URI: | http://ir.iscas.ac.cn/handle/311060/16662
|
Appears in Collections: | 软件所图书馆_会议论文
|
There are no files associated with this item.
|
Recommended Citation: |
Wang, Qian ,Zhang, Xianyi ,Zhang, Yunquan ,et al. AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs[C]. 见:2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013. Denver, CO, United states. November 17, 2013 - November 22, 2013.
|
|
|