Scaling Collaborative Bioinformatics

Master's Thesis


Though many common bioinformatics problems are amenable to parallelization and large datasets are becoming the norm for biological inquiry, biologists do not generally have the skillset to effectively automate, parallelize, and scale their workflows. This document describes contributions to bioinformatics ranging from collaborative frameworks to the automation of common workflows to the development of novel algorithms. We begin by describing Biocompute, a web portal that overcomes challenges in user interface design and resource sharing to facilitate collaborations between systems programmers, bioinformatics software developers, and biologists. Next, we highlight several parallel workflow implementations developed to serve the needs of the University of Notre Dame’s biologists. These leverage insights from both biology and distributed systems to achieve their goals. In implementing them we encountered and solved several practical challenges on the path to scaling up. We close with the introduction of a bioinformatics algorithm to detect loci-specific selective pressure favoring codon rarity in ortholog groups that span Archaea, Prokaryota, and Eukaryota.


Attribute NameValues
  • etd-04192013-135548

Author Rory Carmichael
Advisor Scott Emrich
Contributor Douglas Thain, Committee Member
Contributor Scott Emrich, Committee Member
Contributor Patricia Clark, Committee Member
Degree Level Master's Thesis
Degree Discipline Computer Science and Engineering
Degree Name MS
Defense Date
  • 2013-04-05

Submission Date 2013-04-19
  • United States of America

  • rare codon

  • tail weight

  • University of Notre Dame

  • English

Record Visibility Public
Content License
  • All rights reserved

Departments and Units


Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.