A Workflow Management System to Facilitate Reproducibility of Scientific Computing Applications

Doctoral Dissertation

Abstract

Reproducibility is becoming an increasingly challenging requirement of the scientific process. Compared to more human intensive scientific procedures, it would seem that scientific applications executed on computers could easily produce identical results despite slight changes to hardware, software, or simply timing. However, implicit dependencies on data and execution environment, coupled with ambiguous definitions of identity and equivalence throughout the process, make reproducibility rarely possible. To address this problem, I created PRUNE, the Preserving Run Environment. In PRUNE, every task to be executed is wrapped in a functional interface and coupled with a strictly defined environment. With this information PRUNE can directly execute each task. As a scientific workflow evolves in PRUNE, a growing but immutable tree of derived data is created. The provenance of every item in the system can be precisely described, facilitating sharing and modification between collaborating researchers, along with efficient management of limited storage space. I show that with a minimal amount of overhead, these capabilities can be available for large scale and complex workflows, such as an analysis of high-energy physics data, a bio-informatics application, and processing of U.S. census data. PRUNE also minimizes the cost of collaborative development of computational science.

Attributes

Attribute NameValues
Author Peter Ivie
Contributor Gregory Madey, Committee Member
Contributor Scott Emrich, Committee Member
Contributor Douglas Thain, Research Director
Contributor Kevin Lannon, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name Doctor of Philosophy
Defense Date
  • 2018-03-28

Submission Date 2018-04-09
Subject
  • Scientific Workflows

  • Reproducibility

  • Workflow Management Systems

  • Environments

  • Replication

  • Preservation

  • Scientific Computing

  • Big Data

  • Workflow Management System

Language
  • English

Record Visibility Public
Content License
Departments and Units

Digital Object Identifier

doi:10.7274/6w924b3209b

This DOI is the best way to cite this doctoral dissertation.

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.