A Flexible Comparative Genomics Framework for Integrating Heterogeneous Sequence Data

Doctoral Dissertation
Thumbnail

Abstract

Genome sequencing technologies have revolutionized biology in the past two decades, yet data analysis has lagged behind data production. In this thesis, we present a framework for analyzing genomic data in more flexible ways than previous techniques. First, the framework allows researchers to design analyses that compare genomic samples directly instead of relying on reference-relative variant calls, as most current tools do. Second, we provide utilities to look at both assembly data and resequencing data in the same analysis, where previous tools were restricted to either looking at an assembly or at resequencing data. Finally, our framework allows researchers to flexibly incorporate alignments to arbitrarily many reference sequences into their analysis.

We describe FlexReseq, the software implementation of this framework. FlexReseq allows researchers to easily customize resequencing analyses using a simple configuration file to define positions of interest. We give results from applications of these tools such as genotyping strains of Plasmodium falciparum, finding diversity and divergence between strains of Anopheles gambiae, detecting inversions based on assembly and alignment information from A. gambiae, and exploring resequencing analysis using alignments to multiple reference sequences.

Attributes

Attribute NameValues
URN
  • etd-07222011-111630

Author Allison Ann Penner Regier
Advisor Scott J. Emrich
Contributor Mihai Pop, Committee Member
Contributor Scott J. Emrich, Committee Chair
Contributor Frank Collins, Committee Member
Contributor Kevin Bowyer, Committee Member
Contributor Nora Besansky, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name PhD
Defense Date
  • 2011-07-07

Submission Date 2011-07-22
Country
  • United States of America

Subject
  • anopheles gambiae

  • plasmodium falciparum

  • bioinformatics

Publisher
  • University of Notre Dame

Language
  • English

Record Visibility and Access Public
Content License
  • All rights reserved

Departments and Units

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.