Abstractions for Scientific Computing on Campus Grids

Doctoral Dissertation

Abstract

Scientific computing users often find it difficult to transform serial domain applications into workloads for large non-dedicated heterogeneous campus grids. Due to hardware and software bottlenecks, a workload that succeeds on 8 nodes can fail disastrously on 128; or even fail on 8 nodes for a different instance of the same problem.

An abstraction is a flexible solution to a pattern of computation that can be used to harness distributed computing resources more easily for non-experts. The users provide the pieces, such as their datasets and serial function, and the workload is constructed and executed for them in an appropriate manner for the environment in order to prevent disastrous configurations and satisfy cost, policy, and performance constraints.

This work presents the design, implementation, and evaluation of a “toolbox” of abstractions: All-Pairs, Sparse-Pairs, and Data-Split-Join. These abstractions are used for several problems in bioinformatics, biometrics, and data mining. The discussion of the abstractions includes modeling of the problem, managing input data, organizing computation on the campus grid, and managing output data. Results include the largest known biometrics All-Pairs result of its kind, in which over two years’ worth of computation was executed in 10 days, and a complete alignment of the Human genome using Sparse-Pairs, which completed in 2.5 hours on over 1000 hosts with 952x speedup.

Attributes

Attribute NameValues
URN
  • etd-06182010-140851

Author Christopher M. Moretti
Advisor Douglas L. Thain
Contributor Nitesh Chawla, Committee Member
Contributor Douglas L. Thain, Committee Chair
Contributor Christian Poellabauer, Committee Member
Contributor Scott Emrich, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name PhD
Defense Date
  • 2010-04-28

Submission Date 2010-06-18
Country
  • United States of America

Subject
  • computing abstractions

  • campus grid

  • distributed systems

  • distributed computing

Publisher
  • University of Notre Dame

Language
  • English

Record Visibility and Access Public
Content License
  • All rights reserved

Departments and Units

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.