GPU Based Acceleration Techniques: Algorithms, Implementations, and Applications

Xiao, Kai

doi:10.7274/4m90dv15x27

XiaoK062015D.pdf (4.45 MB)

GPU Based Acceleration Techniques: Algorithms, Implementations, and Applications

thesis

posted on 2016-02-18, 00:00 authored by Kai Xiao

Shared memory many-core processors such as GPUs have been extensively used in accelerating computation-intensive algorithms and applications. When porting existing algorithms from sequential or other parallel architecture models to shared memory many-core architectures, non-trivial modifications are often needed to match the execution patterns of the target algorithms with the characteristics of many-core architectures. This dissertation presents a collection of methods and techniques for accelerating various important applications on GPU, including radiation dose calcula- tion, ray tracing based graphics rendering, and nearest neighbor search. Specifically, we study the performance issues of ray traversal in spatially decomposed scenes, and propose a new data structure, called Shell, to completely eliminate the expensive hierarchical search operations. We also develop an efficient GPU implementation of the Three Dimensional Digital Differential Analyzer (3D-DDA) algorithm, which avoids the overhead of execution divergence by replacing the nested conditional instruction- s with a set of simple operations. Those two methods are used to accelerate the Collapsed Cone Convolution Superposition (CCCS) algorithm, which is the clinical choice for dose calculation in radiation treatment planning systems. Furthermore, we present a locality enhancing method for Monte Carlo based ray tracing (MCBRT) algorithm on CPU-GPU heterogeneous systems, which improves the spatial and temporal data locality by organizing random rays into coherent groups. Finally, we propose a series of techniques to accelerate nearest neighbor search algorithm on GPU, including a GPU-cache efficient data structure (k-pack tree), a coherent parallel search algorithm, and a cost model based performance optimization method. For each of the target applications, our proposed approaches provide non-trivial performance speedup over the state-of-the-art work, e.g., 6–8X in Monte Carlo dose calculation, and 3.5–5.5X in graphics ray tracing. Our techniques can be implemented in various parallel programming models, such as CUDA and OpenCL, and applicable on many modern GPU architectures, including NVIDIA Kepler/Maxwell, AMD GCN, and Intel Xeon Phi.

History

Date Modified

2017-06-05

Defense Date

2015-06-24

Research Director(s)

X. Sharon Hu

Degree

Doctor of Philosophy

Degree Level

Doctoral Dissertation

Program Name

Computer Science and Engineering

Usage metrics

Keywords

GPU Cuda Opencl computer graphics computing data structure radiation dose calculation

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

GPU Based Acceleration Techniques: Algorithms, Implementations, and Applications

History

Date Modified

Defense Date

Research Director(s)

Degree

Degree Level

Program Name

Usage metrics

Categories

Keywords

Licence

Exports