Computational reproducibility depends on being able to isolate necessary and sufficient computational artifacts and preserve them for later re-execution. Both isolation and preservation of artifacts can be challenging due to the complexity of existing software and systems and the resulting implicit dependencies, resource distribution, and shifting compatibility of systems as time progresses—all conspiring to break the reproducibility of an application. Sandboxing is a technique that has been used extensively in OS environments for isolation of computational artifacts. Several tools were proposed recently that employ sandboxing as a mechanism to ensure reproducibility. However, none of these tools preserve the sandboxed application for re-distribution to a larger scientific community—aspects that are equally crucial for ensuring reproducibility as sandboxing itself. In this paper, we describe a combined sandboxing and preservation framework, which is efficient, invariant and practical for large-scale reproducibility. We present case studies of complex high energy physics applications and show how the framework can be useful for sandboxing, preserving and distributing applications. We report on the completeness, performance, and efficiency of the framework, and suggest possible standardization approaches.
|Departments and Units|
Digital Object Identifier
This DOI is the best way to cite this article.
|James A. Loughead family correspondence|