Publications

(2020). Tracking Performance Portability on the Yellow Brick Road to Exascale. P3HPC.

(2020). Interpreting and Visualizing Performance Portability Metrics. P3HPC.

(2020). Hostile Cache Implications for Small, Dense Linear Solves. MCHPC'20.

(2020). Developing a mini-app for exploring algorithms for unstructured mesh deterministic discrete ordinates transport on many-core architectures. M&C2019.

PDF

(2020). Reviewing the Computational Performance of Structured and Unstructured Grid Deterministic SN Transport Sweeps on Many-core Architectures. JCTT.

DOI

(2020). Evaluating the performance of HPC-style SYCL applications. IWOCL/SYCLCon.

PDF Slides Video DOI

(2019). Performance Portability Across Diverse Computer Architectures. P3HPC.

Slides DOI

(2019). Reviewing the Computational Performance of Deterministic SN Transport Sweeps on Many-core Architectures. ICTT-26.

(2019). A performance analysis of the first generation of HPC-optimized Arm processors. Concurrency and Computation: Practice and Experience (Special Issue).

DOI

(2019). Scaling Results from the First Generation of Arm-based Supercomputers. CUG.

(2018). Evaluating attainable memory bandwidth of parallel programming models via BabelStream. IJCSE.

PDF Code DOI

(2018). UnSNAP: A Mini-App for Exploring the Performance of Deterministic Discrete Ordinates Transport on Unstructured Meshes. WRAp 2018.

DOI

(2018). An Improved Parallelism Scheme for Deterministic Discrete Ordinates Transport. IJHPCA.

DOI

(2018). Comparative Benchmarking of the First Generation of HPC-Optimised Arm Processors on Isambard. CUG.

(2017). Portable methods for measuring cache hierarchy performance. SC.

Poster

(2017). TeaLeaf: A mini-application to enable design-space explorations for iterative sparse linear solvers. WRAp 2017.

DOI

(2017). On the mitigation of cache hostile memory access patterns on many-core CPU architectures. IXPUG.

DOI

(2017). The MEGA-STREAM benchmark on Intel Xeon Phi processors (Knights Landing). IXPUG.

Slides

(2017). Improving achieved memory bandwidth from C++ codes on Intel Xeon Phi Processor (Knights Landing). IXPUG.

Slides

(2016). GPU-STREAM: now in 2D!. SC.

Poster

(2016). Many-core acceleration of a discrete ordinates transport mini-app at extreme scale. ISC.

DOI

(2015). Expressing Parallelism on Many-Core for Deterministic Discrete Ordinates Transport. WRAp 2015.

DOI