Expressing Parallelism on Many-Core for Deterministic Discrete Ordinates Transport


In this paper we demonstrate techniques for increasing the node-level parallelism of a deterministic discrete ordinates neutral particle transport algorithm on a structured mesh to exploit many-core technologies. Transport calculations form a large part of the computational workload of physical simulations and so good performance is vital for the simulations to complete in reasonable time. We will demonstrate our approach utilizing the SNAP mini-app, which gives a simplified implementation of the full transport algorithm but remains similar enough to the real algorithm to act as a useful proxy for research purposes. We present an OpenCL implementation of our improved algorithm which demonstrates a speedup of up to 2.5x the transport sweep performance on a many-core GPGPU device compared to a state-of-the-art multi-core node, the first time this scale of speedup has been achieved for algorithms of this class.

IEEE International Conference on Cluster Computing (CLUSTER): Workshop on Representative Applications