Understanding Dramatic Performance Differences in Workflow/Middleware/Site Combinations: CCL Technical Report October 15th, 2018

Article

Abstract

Scientists using workflows often have access to both High Performance Computing and High-Throughput Computing sites, but HPC sites’ architecture is less conducive to HTC paradigms. The choices of middleware and site can have drastic performance differences on different workflows. To explore these differences, we created tools to expand Makeflow and Work Queue’s capabilities. We then performed four speed of light tests, testing job dispatch rate, data delivery from the master to worker, system bandwidth, and meta-data operations. We then conducted three synthetic workflow tests, a pure data consumptive workflow, a data selectivity workflow, and a data-generating workflow. Finally, we tested our middleware with three real world workflows, BWA-GATK, BLAST, and Lifemapper. We created a short guide which helps guide users in matching site, workflow, and middleware.

Attributes

Attribute NameValues
Creator
  • Kyle Sweeney

  • Douglas Thain

Subject
  • MPI

  • Distributed Computing

  • Workflows

  • Makeflow

  • WorkQueue

Date Created
  • 2018-11-05

Language
  • English

Departments and Units
Record Visibility and Access Public
Content License

Digital Object Identifier

doi:10.7274/r0-zmks-fe74

This DOI is the best way to cite this article.


Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.