A Compiler Toolchain for Distributed Data Intensive Scientific Workflows

Doctoral Dissertation


With the growing amount of computational resources available to researchers today and the explosion of scientific data in modern research, it is imperative that scientists be able to construct data processing applications that harness these vast computing systems. To address this need, I propose applying concepts from traditional compilers, linkers, and profilers to the construction of distributed workflows and evaluate this approach by implementing a compiler toolchain that allows users to compose scientific workflows in a high-level programming language.

In this dissertation, I describe the execution and programming model of this compiler toolchain. Next, I examine four compiler optimizations and evaluate their effectiveness at improving the performance of various distributed workflows. Afterwards, I present a set of linking utilities for packaging workflows and a group of profiling tools for analyzing and debugging workflows. Finally, I discuss modifications made to the run-time system to support features such as enhanced provenance information and garbage collection. Altogether, these components form a compiler toolchain that demonstrates the effectiveness of applying traditional compiler techniques to the challenges of constructing distributed data intensive scientific workflows.


Attribute NameValues
  • etd-06242012-095705

Author Peter James Bui
Advisor Douglas Thain
Contributor Patrick Flynn, Committee Member
Contributor Scott Emrich, Committee Member
Contributor Douglas Thain, Committee Chair
Contributor Jesus Izaguirre, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name PhD
Defense Date
  • 2012-06-07

Submission Date 2012-06-24
  • United States of America

  • compiler

  • distributed systems

  • workflows

  • python

  • University of Notre Dame

  • English

Record Visibility and Access Public
Content License
  • All rights reserved

Departments and Units


Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.