University of Notre Dame
Browse

File(s) under permanent embargo

The Challenges of Scaling Up High-Throughput Workflow with Container Technology

thesis
posted on 2019-10-03, 00:00 authored by Chao Zheng

High-throughput computing (HTC) is about using a large amount of computing resources over a long time to accomplish many independent and parallel computational tasks. HTC workloads are often described in the form of workflow and run on distributed systems through workflow systems. However, as most workflow systems are not liable for managing the task execution environment, HTC workflows are regularly limited in dedicated HTC facilities that have required settings.

Lately, container runtimes have been widely deployed across public cloud because of its ability to deliver execution environment with lower overheads than the virtual machine. This trend provides users of HTC workflows an opportunity to use unlimited computing power on the cloud. However, migrating complex workflow systems to a container environment is cumbersome.

To containerize HTC workflows and scale them up on the cloud, I synthesize my experiences on using container technologies and develop a methodology that contains seven design factors: i) Isolation Granularity – the granularity of isolation should be determined by characteristics for target workloads; ii) Container Management – container runtimes must be adapted to the distributed environment, and the under-layer distributed systems best does the management of containers; iii) Im- age Management – a cooperated mechanism can help to speed up and improve the efficiency of image distribution in distributed environment; iv) Garbage Collection – timely garbage collection is necessary given the massive amount of intermediate data generated by the HTC workflow; v) Network Connection – excessive network connections should be avoided considering the plenty of small transmissions; vi) Resource Management – customized resource management mechanisms that fully consider the characteristics of the target workflow are required; vii) Cross-layer Cooperation – implementation of advanced features requires cooperation between the upper-layer workflow system and the under-layer cluster manager.

In addition to HTC workflows, I validate the above factors through my work of standardizing resource provisioning process for extreme scale online workloads, and observe that they are equally applicable to the HTC workflow as well as the extreme scale online workload.


History

Date Modified

2019-10-31

Defense Date

2019-08-22

CIP Code

  • 40.0501

Research Director(s)

Douglas L. Thain

Committee Members

Christian Poellabauer Dong Wang Lukas Rupprecht

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Alternate Identifier

1125224074

Library Record

5261608

OCLC Number

1125224074

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC