Efficient Tree-based Parallel Algorithms for N-Body Simulations Using C++ Standard Parallelism

Lane Cassell, Thomas and Deakin, Tom and Alpay, Aksel and Heuveline, Vincent and Brito Gadeschi, Gonzalo

Workshop on Irregular Applications: Architectures and Algorithms held in conjunction with Supercomputing (P3HPC), 2024

Abstract

The Barnes-Hut approximation for N-body simula- tions reduces the time complexity of the naive all-pairs approach from O(N2) to O(N log N) by hierarchically aggregating nearby particles into single entities using a tree data structure. This inherently irregular algorithm poses substantial challenges for performance portable implementations on multi-core CPUs and GPUs. We introduce two portable fully-parallel Barnes-Hut implementation strategies that trade-off different levels of GPU support for performance: an unbalanced concurrent octree, and a balanced bounding volume hierarchy sorted by a Hilbert space- filling curve. We implemente these algorithms in portable ISO C++ using parallel algorithms and concurrency primitives like atomics. The results demonstrate competitive performance on a range of CPUs and GPUs. Additionally, they highlight the effectiveness of the par execution policy for highly concurrent irregular algorithms, outperforming par_unseq on CPUs and GPUs with Independent Forward Progress.

@inproceedings{ia3-24,
  author = {Lane Cassell, Thomas and Deakin, Tom and Alpay, Aksel and Heuveline, Vincent and Brito Gadeschi, Gonzalo},
  title = {{Efficient Tree-based Parallel Algorithms for N-Body Simulations Using C++ Standard Parallelism}},
  booktitle = {{Workshop on Irregular Applications: Architectures and Algorithms held in conjunction with Supercomputing (P3HPC)}},
  year = {2024},
  publisher = {{IEEE}},
  keywords = {Conferences and Workshops},
  pdf = {https://hdl.handle.net/1983/a61da15d-738e-4c23-afd7-90dcea70f3a2}
}