Principles for the Design and Operation of Elastic Scientific Applications on Distributed Systems

Doctoral Dissertation

Abstract

Scientific applications often harness the concurrency in their workloads to par- tition and operate them as independent tasks and achieve reasonable performance. To improve performance at scale, the partitions are operated in parallel on large pools of resources in distributed computing systems, such as clouds, clusters, and grids. However, the exclusive and on-demand deployment of applications on these platforms presents challenges. The target hardware is unknown until runtime and variable between deployments when applications are deployed on these platforms. So operating parameters such as the number of partitions and the instances to provision for execution must be determined at runtime for efficient operation.

In this work, I build and demonstrate elastic applications to provide the desired characteristics for operation on distributed computing systems. I present case-studies of elastic applications from different scientific domains and draw broad observations on their design and the challenges to their efficient operation. I develop and evaluate techniques at the middleware and the application layer to achieve efficient operation. In effect, the presented techniques create self-operating elastic applications that dy- namically determine the partitions of their workloads and the scale of resources to utilize. I conclude by showing that self-operating applications achieve high time- and cost-efficiency in their deployed environments in distributed computing systems.

Attributes

Attribute NameValues
URN
  • etd-04172015-014334

Author Dinesh Rajan Pandiarajan
Advisor Douglas Thain
Contributor Scott Emrich, Committee Member
Contributor Douglas Thain, Committee Chair
Contributor Aaron Striegel, Committee Member
Contributor Jesus Izaguirre, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name PhD
Defense Date
  • 2015-04-06

Submission Date 2015-04-17
Country
  • United States of America

Subject
  • resource allocation

  • cloud computing

  • workload partitioning

  • Elastic applications

  • concurrent applications

  • scientific applications

  • data partitioning

  • distributed computing

Publisher
  • University of Notre Dame

Language
  • English

Record Visibility and Access Public
Content License
  • All rights reserved

Departments and Units

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.