Assessing the GPU Offload Threshold of GEMM and GEMV Kernels on Modern Heterogeneous HPC Systems

Wilkinson, Finn and Cockrean, Alex and Lin, Wei-Chen and McIntosh-Smith, Simon and Deakin, Tom

International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems held in conjunction with Supercomputing (PMBS), 2024

Abstract

With an ever-growing compute advantage over CPUs, GPUs are often used in workloads with ample BLAS computation to improve performance. However, several factors including data-to-compute ratio, amount of data re-use, and data structure shape can all impact performance. Hence, using a GPU is not a guarantee of better BLAS performance. In this work, we introduce the GPU BLAS Offload Benchmark (GPU-BLOB), a novel and portable benchmark that measures CPU and GPU compute performance of different BLAS kernels and problem configurations. From the GPU offload threshold (a BLAS kernel’s minimum dimensions for a certain configuration where using a GPU is guaranteed to yield improved performance), we evaluate the per-node performance of three, in-production, HPC systems. We show that the offload threshold for GEMM is highly dependant on problem shape and number of consecutive BLAS calls, and that, contrary to conventional wisdom, GEMV can benefit from GPU acceleration, especially on SoC-based systems.

@inproceedings{pmbs24-gpu-offload,
  author = {Wilkinson, Finn and Cockrean, Alex and Lin, Wei-Chen and McIntosh-Smith, Simon and Deakin, Tom},
  title = {{Assessing the GPU Offload Threshold of GEMM and GEMV Kernels on Modern Heterogeneous HPC Systems}},
  booktitle = {{International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems held in conjunction with Supercomputing (PMBS)}},
  year = {2024},
  publisher = {{IEEE}},
  keywords = {Conferences and Workshops},
  pdf = {https://hdl.handle.net/1983/94ff214a-cb24-4d91-af3c-cd6f742606e6},
  doi = {10.1109/SCW63240.2024.00188}
}