Towards Efficient Load Balancing BFS on GPUs: One Code for AMD, Intel & Nvidia

Olgu, Kaan and Kenter, Tobias and Nunez-Yanez, Jose and McIntosh-Smith, Simon and Deakin, Tom

Twelfth Workshop on Accelerator Programming and Directives (WACCPD), 2025

Abstract

Efficient graph processing is essential for a wide range of applications. Scalability and memory access patterns are still a challenge, especially with the Breadth-First Search algorithm. This work focuses on leveraging multi-GPU HPC nodes with peer-to-peer support of the Intel oneAPI implementation of SYCL. We propose three GPU-based load-balancing methods: work-group localisation for efficient data access, even workload distribution for higher GPU occupancy, and a hybrid strided-access approach for heuristic balancing. These methods ensure performance, portability, and productivity with a unified codebase. Our proposed methodologies outperform state-of-the-art single-GPU implementations based on CUDA on synthetic RMAT graphs. We analysed BFS performance across NVIDIA A100, Intel Max 1550, and AMD MI300X GPUs, achieving a peak performance of 153.27 GTEPS on an RMAT25-64 graph using 8 GPUs on the NVIDIA A100. Furthermore, our work handles RMAT graphs up to scale 29, achieving superior performance on synthetic graphs and competitive results on real-world datasets.

@inproceedings{waccpd25,
  author = {Olgu, Kaan and Kenter, Tobias and Nunez-Yanez, Jose and McIntosh-Smith, Simon and Deakin, Tom},
  title = {{Towards Efficient Load Balancing BFS on GPUs: One Code for AMD, Intel \& Nvidia}},
  year = {2025},
  booktitle = {{Twelfth Workshop on Accelerator Programming and Directives (WACCPD)}},
  keywords = {Conferences and Workshops}
}