Non-Model Transcriptomics: Applications, Assessments, and Algorithms

Doctoral Dissertation


Transcriptome sequencing (sequencing only from the protein coding genes of a genome) has multiplied our ability to understand the biology of life on Earth. While full genome sequencing is still prohibitively expensive for many species, sequencing of genes only provides direct access to the most functional elements of a genome for a fraction of the cost. This advance brings broad genetic resources to those studying species of even the most specialized interest.

We use these techniques to answer an important ecological question: How will species react to climate change? It has been assumed that as climate warms, populations will simply shift poleward to compensate. Unfortunately, previous reseach shows that some populations (of two butterflies) are adapted to local conditions and may not respond this way. To discover the genetic basis of these results, we first sequenced, assembled, annotated, and analyzed the butterfly transcriptomes. Because sequences were sampled from wild-caught populations, we developed novel methods to ensure high quality results in this setting. We then designed custom microarrays to measure how much of every gene is expressed in a given experimental setting. These, coupled with a robust experimental design, revealed a variety of genes and functional categories carrying the signature of local adaptation to climate.

The bioinformatic difficulties associated with such projects are many. In particular, transcriptome assembly presents unique challenges, and it is not yet clear how to quantitatively evaluate assemblies. By simulating sequencing and comparing assembler results to those of perfect assemblies, we evaluate a number of commonly used and novel quality metrics. This study reveals that some quality metrics reflect biological accuracy while others (such as contig N50 length) do not and provides vital information for researchers making use of transcriptome data.

Finally, when sequences are sourced from many genetically diverse individuals, our tools would ideally reveal this diversity rather than produce a simple genetic consensus. To this end, we develop algorithms to seperately assemble diverse sequences (haplotypes) accurately in the face of both sequencing error and data ambiguity. These methods will help reveal biodiversity in applications ranging from community ecology to epidemiology.


Attribute NameValues
  • etd-04182012-093508

Author Shawn Thomas O'Neil
Advisor Scott J. Emrich
Contributor Jessica J. Hellmann, Committee Co-Chair
Contributor Michael Pfrender, Committee Member
Contributor Laurel Riek, Committee Member
Contributor Scott J. Emrich, Committee Chair
Contributor Kevin W. Bowyer, Committee Member
Contributor Jason McLachlan, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name Doctor of Philosophy
Defense Date
  • 2012-04-05

Submission Date 2012-04-18
  • United States of America

  • assembly

  • bioinformatics

  • transcriptome

  • University of Notre Dame

  • English

Record Visibility Public
Content License
  • All rights reserved

Departments and Units

Digital Object Identifier


This DOI is the best way to cite this doctoral dissertation.


Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.