Analyzing Reduction Abstraction Capabilities


Reductions are a common pattern in parallel programming, and every parallel programming language or framework has its own reduction abstraction with its own idiosyncrasies. These abstractions differ not only in their syntax, but also in their semantics and their ability to express certain types of reduction. Such differences may prevent specific combinations of abstraction and hardware platform from reaching high levels of performance, with consequences for portability and programmer productivity. In this paper, we present a set of representative reduction benchmarks to explore the capabilities of five contemporary programming languages and frameworks - OpenMP, Kokkos, RAJA, SYCL, and the oneAPI DPC++ Library (oneDPL) - across a variety of hardware platforms, including CPUs and GPUs from multiple vendors. We discuss the advantages and disadvantages of each reduction abstraction, and conclude with recommendations to improve their design and implementation.

International Workshop on Performance, Portability and Productivity in HPC