1

Analyzing Reduction Abstraction Capabilities

Reductions are a common pattern in parallel programming, and every parallel programming language or framework has its own reduction abstraction with its own idiosyncrasies. These abstractions differ not only in their syntax, but also in their …

Benchmarking and Extending SYCL Hierarchical Parallelism

SYCL is an open-standard, parallel programming model for programming heterogeneous devices from Khronos. It allows single-source programming of diverse attached devices in a cross-platform manner in modern C++. SYCL provides different layers of …

On measuring the maturity of SYCL implementations by tracking historical performance improvements

SYCL is a platform agnostic, single-source, C++ based, parallel programming framework for developing platform independent software for heterogeneous systems. As an emerging framework, SYCL has been under active development for several years, with …

Interpreting and Visualizing Performance Portability Metrics

Recent work has introduced a number of tools and techniques for reasoning about the interplay between application performance and portability, or "performance portability". These tools have proven useful for setting goals and guiding high-level …

Tracking Performance Portability on the Yellow Brick Road to Exascale

With Exascale machines on our immediate horizon, there is a pressing need for applications to be made ready to best exploit these systems. However, there will be multiple paths to Exascale, with each system relying on processor and accelerator …

Hostile Cache Implications for Small, Dense Linear Solves

The full assembly of the stiffness matrix in finite element codes can be prohibitive in terms of memory footprint resulting from storing that enormous matrix. An optimisation and work around, particularly effective for discontinuous Galerkin based …

Developing a mini-app for exploring algorithms for unstructured mesh deterministic discrete ordinates transport on many-core architectures

Recent trends in computational architecture design are yielding processors with deep and complex memory hierarchies consisting of small capacity caches and large capacity main memory. CPU parallelism is also hierarchical, consisting of SIMD vector …

Evaluating the performance of HPC-style SYCL applications

SYCL is a parallel programming model for developing single-source programs for running on heterogeneous platforms. To this end, it allows for one code to be written which can run on a different architectures. For this study, we develop applications …

Performance Portability Across Diverse Computer Architectures

Previous studies into performance portability have typically analysed a single application (and its various implementations) in isolation. In this study we explore the wider landscape of performance portability by considering a number of applications …

Scaling Results from the First Generation of Arm-based Supercomputers

In this paper we present the first scaling results from Isambard, the first production supercomputer to be based on Arm CPUs that have been optimised specifically for HPC. Isambard is a Cray XC50 `Scout' system, combining Marvell ThunderX2 Arm-based …