ISCAS OpenIR
openblas: a high performance blas library on loongson 3a cpu
Zhang Xian-Yi; Wang Qian; Zhang Yun-Quan
2011
SourceRuan Jian Xue Bao/Journal of Software
ISSN1000-9825
Volume22Issue:UPPL. 2Pages:208-216
English AbstractBLAS is a fundamental math library in scientific computing. Thus, each CPU vendor releases optimized BLAS library for its own CPU. Loongson CPU series are developed by the Institute of Computing Technology, Chinese Academy of Sciences. In 2010, it released Loongson 3 CPU series. This paper introduces the open source BLAS library OpenBLAS, which is forked on GotoBLAS 2-1.13 BSD version. BLAS Level 3 functions of OpenBLAS is optimized on Loongson 3A quad cores CPU. In sequential optimizations, blocking, hand coding assembly kernel, Loongson 3A special instructions and reordering instructions are utilized. The performance of BLAS Level 3 subroutines exceeded GotoBLAS and ATLAS by about 75% and 17%. Meanwhile, it exceeded GotoBLAS and ATLAS by about 103% and 36% in double precision functions. In parallel multi-threads optimization, this study used interleaved data buffer layout to avoid shared L2 Cache conflictions among multi-threads. OpenBLAS achieved 3.47 speedups on quad cores. In 4 threads, the performance of OpenBLAS BLAS Level3 functions exceeded GotoBLAS and ATLAS by about 69% and 34%, 89% and 55% in double precision functions. ©2011 Journal of Software.; BLAS is a fundamental math library in scientific computing. Thus, each CPU vendor releases optimized BLAS library for its own CPU. Loongson CPU series are developed by the Institute of Computing Technology, Chinese Academy of Sciences. In 2010, it released Loongson 3 CPU series. This paper introduces the open source BLAS library OpenBLAS, which is forked on GotoBLAS 2-1.13 BSD version. BLAS Level 3 functions of OpenBLAS is optimized on Loongson 3A quad cores CPU. In sequential optimizations, blocking, hand coding assembly kernel, Loongson 3A special instructions and reordering instructions are utilized. The performance of BLAS Level 3 subroutines exceeded GotoBLAS and ATLAS by about 75% and 17%. Meanwhile, it exceeded GotoBLAS and ATLAS by about 103% and 36% in double precision functions. In parallel multi-threads optimization, this study used interleaved data buffer layout to avoid shared L2 Cache conflictions among multi-threads. OpenBLAS achieved 3.47 speedups on quad cores. In 4 threads, the performance of OpenBLAS BLAS Level3 functions exceeded GotoBLAS and ATLAS by about 69% and 34%, 89% and 55% in double precision functions. ©2011 Journal of Software.
Indexed TypeEI
KeywordComputer Software Software Engineering
Department(1) Laboratory of Parallel Software and Computational Science Institute of Software Chinese Academy of Sciences Beijing 100190 China; (2) State Key Laboratory of Computing Science Chinese Academy of Sciences Beijing 100190 China; (3) Graduate University Chinese Academy of Sciences Beijing 100190 China
Language中文
Content Type期刊论文
URIhttp://ir.iscas.ac.cn/handle/311060/16164
Collection中国科学院软件研究所
Recommended Citation
GB/T 7714
Zhang Xian-Yi,Wang Qian,Zhang Yun-Quan. openblas: a high performance blas library on loongson 3a cpu[J]. Ruan Jian Xue Bao/Journal of Software,2011,22(UPPL. 2):208-216.
APA Zhang Xian-Yi,Wang Qian,&Zhang Yun-Quan.(2011).openblas: a high performance blas library on loongson 3a cpu.Ruan Jian Xue Bao/Journal of Software,22(UPPL. 2),208-216.
MLA Zhang Xian-Yi,et al."openblas: a high performance blas library on loongson 3a cpu".Ruan Jian Xue Bao/Journal of Software 22.UPPL. 2(2011):208-216.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zhang Xian-Yi]'s Articles
[Wang Qian]'s Articles
[Zhang Yun-Quan]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhang Xian-Yi]'s Articles
[Wang Qian]'s Articles
[Zhang Yun-Quan]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhang Xian-Yi]'s Articles
[Wang Qian]'s Articles
[Zhang Yun-Quan]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.