1

Interpreting and Visualizing Performance Portability Metrics

Recent work has introduced a number of tools and techniques for reasoning about the interplay between application performance and portability, or "performance portability". These tools have proven useful for setting goals and guiding high-level …

Tracking Performance Portability on the Yellow Brick Road to Exascale

With Exascale machines on our immediate horizon, there is a pressing need for applications to be made ready to best exploit these systems. However, there will be multiple paths to Exascale, with each system relying on processor and accelerator …

Hostile Cache Implications for Small, Dense Linear Solves

The full assembly of the stiffness matrix in finite element codes can be prohibitive in terms of memory footprint resulting from storing that enormous matrix. An optimisation and work around, particularly effective for discontinuous Galerkin based …

Developing a mini-app for exploring algorithms for unstructured mesh deterministic discrete ordinates transport on many-core architectures

Recent trends in computational architecture design are yielding processors with deep and complex memory hierarchies consisting of small capacity caches and large capacity main memory. CPU parallelism is also hierarchical, consisting of SIMD vector …

Evaluating the performance of HPC-style SYCL applications

SYCL is a parallel programming model for developing single-source programs for running on heterogeneous platforms. To this end, it allows for one code to be written which can run on a different architectures. For this study, we develop applications …

Performance Portability Across Diverse Computer Architectures

Previous studies into performance portability have typically analysed a single application (and its various implementations) in isolation. In this study we explore the wider landscape of performance portability by considering a number of applications …

Scaling Results from the First Generation of Arm-based Supercomputers

In this paper we present the first scaling results from Isambard, the first production supercomputer to be based on Arm CPUs that have been optimised specifically for HPC. Isambard is a Cray XC50 `Scout' system, combining Marvell ThunderX2 Arm-based …

UnSNAP: A Mini-App for Exploring the Performance of Deterministic Discrete Ordinates Transport on Unstructured Meshes

Solving the deterministic discrete ordinates neutral particle transport equation is a computationally expensive application. On an unstructured mesh, the discontinuous Galerkin finite element method is used for discretisation of the spatial domain. …

Comparative Benchmarking of the First Generation of HPC-Optimised Arm Processors on Isambard

In this paper we present performance results from Isambard, the first production supercomputer to be based on Arm CPUs that have been optimised specifically for HPC. Isam- bard is the first Cray XC50 ‘Scout’ system, combining Cavium ThunderX2 …

TeaLeaf: A mini-application to enable design-space explorations for iterative sparse linear solvers

Iterative sparse linear solvers are an important class of algorithm in high performance computing, and form a crucial component of many scientific codes. As intra and inter node parallelism continues to increase rapidly, the design of new, scalable …