University of Notre Dame
Browse

A Workflow Management System to Facilitate Reproducibility of Scientific Computing Applications

Download (2.78 MB)
thesis
posted on 2018-04-09, 00:00 authored by Peter Ivie

Reproducibility is becoming an increasingly challenging requirement of the scientific process. Compared to more human intensive scientific procedures, it would seem that scientific applications executed on computers could easily produce identical results despite slight changes to hardware, software, or simply timing. However, implicit dependencies on data and execution environment, coupled with ambiguous definitions of identity and equivalence throughout the process, make reproducibility rarely possible. To address this problem, I created PRUNE, the Preserving Run Environment. In PRUNE, every task to be executed is wrapped in a functional interface and coupled with a strictly defined environment. With this information PRUNE can directly execute each task. As a scientific workflow evolves in PRUNE, a growing but immutable tree of derived data is created. The provenance of every item in the system can be precisely described, facilitating sharing and modification between collaborating researchers, along with efficient management of limited storage space. I show that with a minimal amount of overhead, these capabilities can be available for large scale and complex workflows, such as an analysis of high-energy physics data, a bio-informatics application, and processing of U.S. census data. PRUNE also minimizes the cost of collaborative development of computational science.

History

Date Created

2018-04-09

Date Modified

2018-11-02

Defense Date

2018-03-28

Research Director(s)

Douglas Thain

Committee Members

Gregory Madey Scott Emrich Kevin Lannon

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Rights Statement

https://creativecommons.org/licenses/by-nc/4.0/

Additional Groups

  • Computer Science and Engineering

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC