Reviewing the Computational Performance of Deterministic SN Transport Sweeps on Many-core Architectures

Peer reviewed abstract and presentation at the conference.

Portable methods for measuring cache hierarchy performance

There has been a recent influx of different processor architecture designs into the market, with many of them targeting HPC applications. When estimating application performance, developers are used to considering the most common figures of merit, …

Improving achieved memory bandwidth from C++ codes on Intel Xeon Phi Processor (Knights Landing)

The MEGA-STREAM benchmark on Intel Xeon Phi processors (Knights Landing)

GPU-STREAM: now in 2D!

We present a major update to the GPU-STREAM benchmark implementation, first shown at SC15. The original benchmark allowed comparison of achievable memory bandwidth performance through the STREAM kernels on OpenCL devices. GPU-STREAM v2.0 extends the …

GPU-STREAM: Benchmarking the achievable memory bandwidth of Graphics Processing Units

Many scientific codes consist of memory bandwidth bound kernels - the dominating factor of the runtime is the speed at which data can be loaded from memory into the Arithmetic Logic Units. Generally Programmable Graphics Processing Units (GPGPUs) and …