A Compiler Toolchain for Distributed Data Intensive Scientific Workflows
With the growing amount of computational resources available to researchers today and the explosion of scientific data in modern research, it is imperative that scientists be able to construct data processing applications that harness these vast computing systems. To address this need, I propose applying concepts from traditional compilers, linkers, and profilers to the construction of distributed workflows and evaluate this approach by implementing a compiler toolchain that allows users to compose scientific workflows in a high-level programming language.
In this dissertation, I describe the execution and programming model of this compiler toolchain. Next, I examine four compiler optimizations and evaluate their effectiveness at improving the performance of various distributed workflows. Afterwards, I present a set of linking utilities for packaging workflows and a group of profiling tools for analyzing and debugging workflows. Finally, I discuss modifications made to the run-time system to support features such as enhanced provenance information and garbage collection. Altogether, these components form a compiler toolchain that demonstrates the effectiveness of applying traditional compiler techniques to the challenges of constructing distributed data intensive scientific workflows.
History
Date Modified
2017-06-05Defense Date
2012-06-07Research Director(s)
Douglas ThainCommittee Members
Patrick Flynn Scott Emrich Jesus IzaguirreDegree
- Doctor of Philosophy
Degree Level
- Doctoral Dissertation
Language
- English
Alternate Identifier
etd-06242012-095705Publisher
University of Notre DameAdditional Groups
- Computer Science and Engineering
Program Name
- Computer Science and Engineering