University of Notre Dame
Browse
ShafferT112022D.pdf (1.67 MB)

Proactive Storage Management for High-Throughput Scientific Workflows

Download (1.67 MB)
thesis
posted on 2022-11-15, 00:00 authored by Tim Shaffer

Effectively supporting modern large-scale and data-centric scientific workflows moving into the exascale era requires a change in the way researchers interact with the system to better coordinate IO behavior and storage across multiple layers of the system. Instead of targeting low-level storage and IO operations with broad optimizations, the key to supporting the wide variety of modern applications lies in leveraging higher-level information from workflow managers and execution frameworks. Rather than simply reacting to individual tasks as they execute, we can leverage workflow-level information in advance of execution to carry out proactive management decisions that avoid entire categories of performance bottlenecks from the start.

This dissertation first explores the interactions between workflow structure and application performance, and looks at leveraging imprecise user intuition to flexibly restructure workflows. Next, it examines the scalability limits of shared filesystems, showing how large-scale application behavior leads to 'metadata storms' that are difficult to handle without leveraging workflow information. Applying advanced dependency management planning and dynamic resource management methods to distributed Python applications next demonstrates the benefits of workflow-aware techniques. It then proposes dependency-oriented container management strategies to improve storage usage and reproducibility over standard techniques that do not leverage workflow information. Finally, the Landlord algorithm is introduced to address the combinatorial explosion in storage of 'container sprawl' by taking advantage of workflow behavior to coalesce related container environments. The approaches developed in this work give scientists a flexible toolset to leverage workflow-level information for proactive management across a wide variety of application types and computing environments.

History

Date Modified

2022-12-06

Defense Date

2022-09-30

CIP Code

  • 40.0501

Research Director(s)

Douglas L. Thain

Committee Members

Paul Brenner Jarek Nabrzyski Pete Kogge

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Alternate Identifier

1353240814

Library Record

6304822

OCLC Number

1353240814

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Keywords

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC