Analysis and prediction of GPU graph algorithm performance

Abstract

In this thesis we investigate the relation between the structure of input graphs and the performance of GPU (Graphical Processing Unit) parallelisation strategies for graph processing algorithms. Accelerators in general, and Graphical Processing Units (GPUs) in particular, have become a staple of High Performance Computing (HPC) systems. As a result, the ability to analyse, understand, and predict the performance of GPU code is crucial to effectively use these HPC systems. Graph algorithms are widely applicable in many areas of science, but it is poorly understood how to map graph processing algorithms to GPUs. In this thesis we demonstrate that the structure of input graphs has significant effect on the performance of different parallelisation strategies. We show that it is infeasible to use analytical models to predict the best performing parallelisation strategy for a graph. We present a case study using PageRank and Breadth First Search (BFS) that shows that we can speed up their performance by using a Binary Decision Tree (BDT) model to predict the appropriate parallelisation strategies for a given graph. Additionally, we produced a software pipeline and a data set of our measurements that can be used by others to reproduce or expand upon our work.