The Challenges of Scaling Up High-Throughput Workflow with Container Technology

Doctoral Dissertation
Thumbnail

Abstract

High-throughput computing (HTC) is about using a large amount of computing resources over a long time to accomplish many independent and parallel computational tasks. HTC workloads are often described in the form of workflow and run on distributed systems through workflow systems. However, as most workflow systems are not liable for managing the task execution environment, HTC workflows are regularly limited in dedicated HTC facilities that have required settings.

Lately, container runtimes have been widely deployed across public cloud because of its ability to deliver execution environment with lower overheads than the virtual machine. This trend provides users of HTC workflows an opportunity to use unlimited computing power on the cloud. However, migrating complex workflow systems to a container environment is cumbersome.

To containerize HTC workflows and scale them up on the cloud, I synthesize my experiences on using container technologies and develop a methodology that contains seven design factors: i) Isolation Granularity – the granularity of isolation should be determined by characteristics for target workloads; ii) Container Management – container runtimes must be adapted to the distributed environment, and the under-layer distributed systems best does the management of containers; iii) Im- age Management – a cooperated mechanism can help to speed up and improve the efficiency of image distribution in distributed environment; iv) Garbage Collection – timely garbage collection is necessary given the massive amount of intermediate data generated by the HTC workflow; v) Network Connection – excessive network connections should be avoided considering the plenty of small transmissions; vi) Resource Management – customized resource management mechanisms that fully consider the characteristics of the target workflow are required; vii) Cross-layer Cooperation – implementation of advanced features requires cooperation between the upper-layer workflow system and the under-layer cluster manager.

In addition to HTC workflows, I validate the above factors through my work of standardizing resource provisioning process for extreme scale online workloads, and observe that they are equally applicable to the HTC workflow as well as the extreme scale online workload.


Attributes

Attribute NameValues
Author Chao Zheng
Contributor Douglas L. Thain, Research Director
Contributor Christian Poellabauer, Committee Member
Contributor Dong Wang, Committee Member
Contributor Lukas Rupprecht, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name Doctor of Philosophy
Banner Code
  • PHD-CSE

Defense Date
  • 2019-08-22

Submission Date 2019-10-03
Subject
  • High-Throughput Computing

  • Cloud Computing

  • Distributed System

Language
  • English

Record Visibility and Access Public
Content License
  • All rights reserved

Departments and Units
Catalog Record

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.