ISCAS OpenIR
AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs
Wang, Qian (1); Zhang, Xianyi (1); Zhang, Yunquan (2); Yi, Qing (3)
2013
Conference Name2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013
Conference DateNovember 17, 2013 - November 22, 2013
Conference PlaceDenver, CO, United states
Indexed TypeEI
Publish PlaceIEEE Computer Society
ISSN21674329
ISBN9781450323789
Department(1) Institute of Software, Chinese Academy of Sciences, University of Chinese, Beijing, China; (2) Institute of Software, Chinese Academy of Sciences, State Key Lab of Computer Architecture, Beijing, China; (3) University of Colorado at Colorado Springs, Colorado, United States
English AbstractBasic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameterized code templates to formulate a number of commonly occurring instruction sequences within the optimized low-level C code of these DLA kernels. Then, our framework uses a specialized low-level C optimizer to identify instruction sequences that match the pre-defined code templates and thereby translates them into extremely efficient SSE/AVX instructions. The DLA kernels generated by our templatebased approach surpass the implementations of Intel MKL and AMD ACML BLAS libraries, on both Intel Sandy Bridge and AMD Piledriver processors. Copyright 2013 ACM.; Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameterized code templates to formulate a number of commonly occurring instruction sequences within the optimized low-level C code of these DLA kernels. Then, our framework uses a specialized low-level C optimizer to identify instruction sequences that match the pre-defined code templates and thereby translates them into extremely efficient SSE/AVX instructions. The DLA kernels generated by our templatebased approach surpass the implementations of Intel MKL and AMD ACML BLAS libraries, on both Intel Sandy Bridge and AMD Piledriver processors. Copyright 2013 ACM.
Language英语
Content Type会议论文
URIhttp://ir.iscas.ac.cn/handle/311060/16662
Collection中国科学院软件研究所
Recommended Citation
GB/T 7714
Wang, Qian ,Zhang, Xianyi ,Zhang, Yunquan ,et al. AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs[C]. IEEE Computer Society,2013.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang, Qian (1)]'s Articles
[Zhang, Xianyi (1)]'s Articles
[Zhang, Yunquan (2)]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang, Qian (1)]'s Articles
[Zhang, Xianyi (1)]'s Articles
[Zhang, Yunquan (2)]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang, Qian (1)]'s Articles
[Zhang, Xianyi (1)]'s Articles
[Zhang, Yunquan (2)]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.