Univ Texas Austin, Dept Comp Sci, 2317 Speedway,Stop D9500, Austin, TX 78712 USA. Univ Texas Austin, Inst Computat Engn & Sci, 2317 Speedway,Stop D9500, Austin, TX 78712 USA. Univ Complutense Madrid, Dept Arquitectura Comp & Automat, E-28040 Madrid, Spain. Intel Corp, Parallel Comp Lab, 2200 Mission Coll Blvd, Santa Clara, CA 95054 USA. Chinese Acad Sci, Inst Software, Beijing 100190, Peoples R China. Chinese Acad Sci, Grad Univ, Beijing 100190, Peoples R China. IBM Corp, Mail Stop 903-3L008,11501 Burnet Rd, Austin, TX 78758 USA. IBM Corp, Thomas J Watson Res Ctr, 1101 Kitchawan Rd, Yorktown Hts, NY 10598 USA. Cray Inc, 901 Fifth Ave,Suite 1000, Seattle, WA 98164 USA.
Abstract:
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to implement the level-3 BLAS on a variety of current architectures. The systems for which we demonstrate the framework include state-of-the-art general-purpose, low-power, and many-core architectures. We show, with very little effort, how the BLIS framework yields sequential and parallel implementations that are competitive with the performance of ATLAS, OpenBLAS (an effort to maintain and extend the GotoBLAS), and commercial vendor implementations such as AMD's ACML, IBM's ESSL, and Intel's MKL libraries. Although most of this article focuses on single-core implementation, we also provide compelling results that suggest the framework's leverage extends to the multithreaded domain.
English Abstract:
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to implement the level-3 BLAS on a variety of current architectures. The systems for which we demonstrate the framework include state-of-the-art general-purpose, low-power, and many-core architectures. We show, with very little effort, how the BLIS framework yields sequential and parallel implementations that are competitive with the performance of ATLAS, OpenBLAS (an effort to maintain and extend the GotoBLAS), and commercial vendor implementations such as AMD's ACML, IBM's ESSL, and Intel's MKL libraries. Although most of this article focuses on single-core implementation, we also provide compelling results that suggest the framework's leverage extends to the multithreaded domain.