Abstractions for Scientific Computing on Campus Grids

Moretti, Christopher M.

doi:10.7274/k0698625s64

MorettiC062010.pdf (794.6 kB)

Abstractions for Scientific Computing on Campus Grids

thesis

posted on 2010-06-18, 00:00 authored by Christopher M. Moretti

Scientific computing users often find it difficult to transform serial domain applications into workloads for large non-dedicated heterogeneous campus grids. Due to hardware and software bottlenecks, a workload that succeeds on 8 nodes can fail disastrously on 128; or even fail on 8 nodes for a different instance of the same problem. An abstraction is a flexible solution to a pattern of computation that can be used to harness distributed computing resources more easily for non-experts. The users provide the pieces, such as their datasets and serial function, and the workload is constructed and executed for them in an appropriate manner for the environment in order to prevent disastrous configurations and satisfy cost, policy, and performance constraints. This work presents the design, implementation, and evaluation of a 'toolbox' of abstractions: All-Pairs, Sparse-Pairs, and Data-Split-Join. These abstractions are used for several problems in bioinformatics, biometrics, and data mining. The discussion of the abstractions includes modeling of the problem, managing input data, organizing computation on the campus grid, and managing output data. Results include the largest known biometrics All-Pairs result of its kind, in which over two years' worth of computation was executed in 10 days, and a complete alignment of the Human genome using Sparse-Pairs, which completed in 2.5 hours on over 1000 hosts with 952x speedup.

History

Date Modified

2017-06-02

Defense Date

2010-04-28

Research Director(s)

Douglas L. Thain

Committee Members

Nitesh Chawla Christian Poellabauer Scott Emrich

Degree

Doctor of Philosophy

Degree Level

Doctoral Dissertation

Language

English

Alternate Identifier

etd-06182010-140851

Publisher

University of Notre Dame

Program Name

Computer Science and Engineering

Usage metrics

Keywords

computing abstractions campus grid distributed systems distributed computing

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Abstractions for Scientific Computing on Campus Grids

History

Date Modified

Defense Date

Research Director(s)

Committee Members

Degree

Degree Level

Language

Alternate Identifier

Publisher

Program Name

Usage metrics

Categories

Keywords

Licence

Exports