University of Notre Dame
Browse

File(s) under permanent embargo

Non-Model Transcriptomics: Applications, Assessments, and Algorithms

thesis
posted on 2012-04-18, 00:00 authored by Shawn Thomas O'Neil

Transcriptome sequencing (sequencing only from the protein coding genes of a genome) has multiplied our ability to understand the biology of life on Earth. While full genome sequencing is still prohibitively expensive for many species, sequencing of genes only provides direct access to the most functional elements of a genome for a fraction of the cost. This advance brings broad genetic resources to those studying species of even the most specialized interest.

We use these techniques to answer an important ecological question: How will species react to climate change? It has been assumed that as climate warms, populations will simply shift poleward to compensate. Unfortunately, previous reseach shows that some populations (of two butterflies) are adapted to local conditions and may not respond this way. To discover the genetic basis of these results, we first sequenced, assembled, annotated, and analyzed the butterfly transcriptomes. Because sequences were sampled from wild-caught populations, we developed novel methods to ensure high quality results in this setting. We then designed custom microarrays to measure how much of every gene is expressed in a given experimental setting. These, coupled with a robust experimental design, revealed a variety of genes and functional categories carrying the signature of local adaptation to climate.

The bioinformatic difficulties associated with such projects are many. In particular, transcriptome assembly presents unique challenges, and it is not yet clear how to quantitatively evaluate assemblies. By simulating sequencing and comparing assembler results to those of perfect assemblies, we evaluate a number of commonly used and novel quality metrics. This study reveals that some quality metrics reflect biological accuracy while others (such as contig N50 length) do not and provides vital information for researchers making use of transcriptome data.

Finally, when sequences are sourced from many genetically diverse individuals, our tools would ideally reveal this diversity rather than produce a simple genetic consensus. To this end, we develop algorithms to seperately assemble diverse sequences (haplotypes) accurately in the face of both sequencing error and data ambiguity. These methods will help reveal biodiversity in applications ranging from community ecology to epidemiology.

History

Date Modified

2017-06-05

Defense Date

2012-04-05

Research Director(s)

Scott J. Emrich

Committee Members

Michael Pfrender Laurel Riek Kevin W. Bowyer Jason McLachlan

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Alternate Identifier

etd-04182012-093508

Publisher

University of Notre Dame

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC