Though many common bioinformatics problems are amenable to parallelization and large datasets are becoming the norm for biological inquiry, biologists do not generally have the skillset to effectively automate, parallelize, and scale their workflows. This document describes contributions to bioinformatics ranging from collaborative frameworks to the automation of common workflows to the development of novel algorithms. We begin by describing Biocompute, a web portal that overcomes challenges in user interface design and resource sharing to facilitate collaborations between systems programmers, bioinformatics software developers, and biologists. Next, we highlight several parallel workflow implementations developed to serve the needs of the University of Notre Dame’s biologists. These leverage insights from both biology and distributed systems to achieve their goals. In implementing them we encountered and solved several practical challenges on the path to scaling up. We close with the introduction of a bioinformatics algorithm to detect loci-specific selective pressure favoring codon rarity in ortholog groups that span Archaea, Prokaryota, and Eukaryota.
Scaling Collaborative BioinformaticsMaster's Thesis
|Contributor||Douglas Thain, Committee Member|
|Contributor||Scott Emrich, Committee Member|
|Contributor||Patricia Clark, Committee Member|
|Degree Level||Master's Thesis|
|Degree Discipline||Computer Science and Engineering|
|Departments and Units|