GPU-STREAM: now in 2D!


We present a major update to the GPU-STREAM benchmark implementation, first shown at SC15. The original benchmark allowed comparison of achievable memory bandwidth performance through the STREAM kernels on OpenCL devices. GPU-STREAM v2.0 extends the benchmark to another dimension: the kernels are implemented in a wide range of popular state-of-the-art parallel programming models. This allows an intuitive comparison of performance across a diverse set of programming models and devices, investigating whether choice of model matters to performance and performance portability. In particular we investigate 7 parallel programming languages (OpenMP 4.x, OpenACC, Kokkos, RAJA, SYCL, CUDA and OpenCL) across 12 devices (6 GPUs from NVIDIA and AMD, Intel Xeon Phi (Knights Landing), 4 generations of Intel Xeon CPUs, and IBM Power 8).

International Conference for High Performance Computing, Networking, Storage and Analysis