Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using BabelStream

Hammond, Jeff R. and Deakin, Tom and Cownie, James and McIntosh-Smith, Simon

International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems held in conjunction with Supercomputing (PMBS), 2022

Abstract

Fortran DO CONCURRENT has emerged as a new way to achieve parallel execution of loops on CPUs and GPUs. This paper studies the performance portability of this construct on a range of processors and compares it with the incumbent models: OpenMP, OpenACC and CUDA. To do this study fairly, we implemented the BabelStream memory bandwidth benchmark from scratch, entirely in modern Fortran, for all of the models considered, which include Fortran DO CONCURRENT, as well as two variants of OpenACC, four variants of OpenMP (2 CPU and 2 GPU), CUDA Fortran, and both loop- and array-based references. BabelStream Fortran matches the C++ implementation as closely as possible, and can be used to make language-based comparisons. This paper represents one of the first detailed studies of the performance of Fortran support on heterogeneous architectures; we include results for AArch64 and x86 64 CPUs as well as AMD, Intel and NVIDIA GPU platforms.

in press

@inproceedings{pmbs22-fortran,
  author = {Hammond, Jeff R. and Deakin, Tom and Cownie, James and McIntosh-Smith, Simon},
  title = {Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using BabelStream},
  booktitle = {{International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems held in conjunction with Supercomputing (PMBS)}},
  year = {2022},
  publisher = {{IEEE}},
  note = {in press},
  keywords = {In press}
}