University of Notre Dame
Browse
MengH042017D.pdf (2.04 MB)

Improving the Reproducibility of Scientific Applications with Execution Environment Specifications

Download (2.04 MB)
thesis
posted on 2017-04-08, 00:00 authored by Haiyan Meng

Reproducibility, a main principle of the scientific method, has historically depended on text and proofs in a publication. However, as computation pervades science and changes the way how research is conducted, relying only on the experimental results in a publication cannot guarantee reproducibility. The execution environment, in which the results were generated, is another important ingredient and must also be preserved to reproduce the results. Unfortunately, execution environments for scientific work are often fragile and too complex to be well understood by researchers, let alone to be preserved.

This dissertation proposes two broad approaches for improving the reproducibility of scientific applications and explore their feasibility and applicability for both single-machine scientific applications and complex scientific workflows. The first approach wraps the minimal execution environment of an application into an all-in-one package. The second approach specifies the execution environment from hardware, kernel and OS all the way up to software, data and environment variables in an organized way, preserves dependencies in the unit of basic OS image, software and data, and combines all the dependencies at runtime using mounting mechanisms.

For each approach, a prototype was implemented and the following three aspects are explored: what to preserve, how to preserve and how to reproduce. The time and space overheads to preserve and reproduce applications, and the correctness of preserved artifacts are evaluated through applications from high energy physics, bioinformatics, epidemiology and scene rendering. The evaluation results show that both approaches allow researchers to reproduce an application and verify its results. However, the second approach avoids storing shared dependencies repeatedly and makes it easier to extend the original work.

This work makes its contribution by demonstrating the importance of execution environments for the reproducibility of scientific applications and differentiating execution environment specifications, which should be lightweight, persistent and deployable, from various tools used to create execution environments, which may experience frequent changes due to technological evolution. It proposes two preservation approaches and prototypes for the purposes of both result verification and research extension, and provides recommendations on how to build reproducible scientific applications from the start.

History

Date Created

2017-04-08

Date Modified

2018-11-01

Defense Date

2017-03-22

Research Director(s)

Douglas Thain

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC