Portable methods for measuring cache hierarchy performance


There has been a recent influx of different processor architecture designs into the market, with many of them targeting HPC applications. When estimating application performance, developers are used to considering the most common figures of merit, such as peak FLOP/s, memory bandwidth, core counts, and so on. In this study, we present a detailed comparison of on-chip memory bandwidths, including single core and aggregate across a node, for a set of next-generation CPUs. We do this in such a way as to be portable across difference architectures and instruction sets. Our study indicates that, while two processors might look superficially similar when only considering the common figures of merit, those two processors might have radically different on-chip memory bandwidth, a fact which may be crucial when understanding observed application performance. Our results and methods will be made available on GitHub to aid the community in evaluating cache bandwidths.

International Conference for High Performance Computing, Networking, Storage and Analysis