MorettiC062010.pdf (794.6 kB)
Abstractions for Scientific Computing on Campus Grids
thesis
posted on 2010-06-18, 00:00 authored by Christopher M. MorettiScientific computing users often find it difficult to transform serial domain applications into workloads for large non-dedicated heterogeneous campus grids. Due to hardware and software bottlenecks, a workload that succeeds on 8 nodes can fail disastrously on 128; or even fail on 8 nodes for a different instance of the same problem. An abstraction is a flexible solution to a pattern of computation that can be used to harness distributed computing resources more easily for non-experts. The users provide the pieces, such as their datasets and serial function, and the workload is constructed and executed for them in an appropriate manner for the environment in order to prevent disastrous configurations and satisfy cost, policy, and performance constraints. This work presents the design, implementation, and evaluation of a 'toolbox' of abstractions: All-Pairs, Sparse-Pairs, and Data-Split-Join. These abstractions are used for several problems in bioinformatics, biometrics, and data mining. The discussion of the abstractions includes modeling of the problem, managing input data, organizing computation on the campus grid, and managing output data. Results include the largest known biometrics All-Pairs result of its kind, in which over two years' worth of computation was executed in 10 days, and a complete alignment of the Human genome using Sparse-Pairs, which completed in 2.5 hours on over 1000 hosts with 952x speedup.
History
Date Modified
2017-06-02Defense Date
2010-04-28Research Director(s)
Douglas L. ThainCommittee Members
Nitesh Chawla Christian Poellabauer Scott EmrichDegree
- Doctor of Philosophy
Degree Level
- Doctoral Dissertation
Language
- English
Alternate Identifier
etd-06182010-140851Publisher
University of Notre DameProgram Name
- Computer Science and Engineering
Usage metrics
Categories
No categories selectedLicence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC