中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 会议论文
Title:
AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs
Author: Wang, Qian (1) ; Zhang, Xianyi (1) ; Zhang, Yunquan (2) ; Yi, Qing (3)
Conference Name: 2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013
Conference Date: November 17, 2013 - November 22, 2013
Issued Date: 2013
Conference Place: Denver, CO, United states
Publish Place: IEEE Computer Society
Indexed Type: EI
ISSN: 21674329
ISBN: 9781450323789
Department: (1) Institute of Software, Chinese Academy of Sciences, University of Chinese, Beijing, China; (2) Institute of Software, Chinese Academy of Sciences, State Key Lab of Computer Architecture, Beijing, China; (3) University of Colorado at Colorado Springs, Colorado, United States
Abstract: Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameterized code templates to formulate a number of commonly occurring instruction sequences within the optimized low-level C code of these DLA kernels. Then, our framework uses a specialized low-level C optimizer to identify instruction sequences that match the pre-defined code templates and thereby translates them into extremely efficient SSE/AVX instructions. The DLA kernels generated by our templatebased approach surpass the implementations of Intel MKL and AMD ACML BLAS libraries, on both Intel Sandy Bridge and AMD Piledriver processors. Copyright 2013 ACM.
English Abstract: Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameterized code templates to formulate a number of commonly occurring instruction sequences within the optimized low-level C code of these DLA kernels. Then, our framework uses a specialized low-level C optimizer to identify instruction sequences that match the pre-defined code templates and thereby translates them into extremely efficient SSE/AVX instructions. The DLA kernels generated by our templatebased approach surpass the implementations of Intel MKL and AMD ACML BLAS libraries, on both Intel Sandy Bridge and AMD Piledriver processors. Copyright 2013 ACM.
Language: 英语
Content Type: 会议论文
URI: http://ir.iscas.ac.cn/handle/311060/16662
Appears in Collections:软件所图书馆_会议论文

Files in This Item:

There are no files associated with this item.


Recommended Citation:
Wang, Qian ,Zhang, Xianyi ,Zhang, Yunquan ,et al. AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs[C]. 见:2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013. Denver, CO, United states. November 17, 2013 - November 22, 2013.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[Wang, Qian (1)]'s Articles
[Zhang, Xianyi (1)]'s Articles
[Zhang, Yunquan (2)]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[Wang, Qian (1)]‘s Articles
[Zhang, Xianyi (1)]‘s Articles
[Zhang, Yunquan (2)]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2019  中国科学院软件研究所 - Feedback
Powered by CSpace