Non-Model Transcriptomics: Applications, Assessments, and Algorithms

O'Neil, Shawn Thomas

doi:10.7274/2514nk33w4v

File(s) under permanent embargo

Non-Model Transcriptomics: Applications, Assessments, and Algorithms

thesis

posted on 2012-04-18, 00:00 authored by Shawn Thomas O'Neil

Transcriptome sequencing (sequencing only from the protein coding genes of a genome) has multiplied our ability to understand the biology of life on Earth. While full genome sequencing is still prohibitively expensive for many species, sequencing of genes only provides direct access to the most functional elements of a genome for a fraction of the cost. This advance brings broad genetic resources to those studying species of even the most specialized interest.

We use these techniques to answer an important ecological question: How will species react to climate change? It has been assumed that as climate warms, populations will simply shift poleward to compensate. Unfortunately, previous reseach shows that some populations (of two butterflies) are adapted to local conditions and may not respond this way. To discover the genetic basis of these results, we first sequenced, assembled, annotated, and analyzed the butterfly transcriptomes. Because sequences were sampled from wild-caught populations, we developed novel methods to ensure high quality results in this setting. We then designed custom microarrays to measure how much of every gene is expressed in a given experimental setting. These, coupled with a robust experimental design, revealed a variety of genes and functional categories carrying the signature of local adaptation to climate.

The bioinformatic difficulties associated with such projects are many. In particular, transcriptome assembly presents unique challenges, and it is not yet clear how to quantitatively evaluate assemblies. By simulating sequencing and comparing assembler results to those of perfect assemblies, we evaluate a number of commonly used and novel quality metrics. This study reveals that some quality metrics reflect biological accuracy while others (such as contig N50 length) do not and provides vital information for researchers making use of transcriptome data.

Finally, when sequences are sourced from many genetically diverse individuals, our tools would ideally reveal this diversity rather than produce a simple genetic consensus. To this end, we develop algorithms to seperately assemble diverse sequences (haplotypes) accurately in the face of both sequencing error and data ambiguity. These methods will help reveal biodiversity in applications ranging from community ecology to epidemiology.

History

Date Modified

2017-06-05

Defense Date

2012-04-05

Research Director(s)

Scott J. Emrich

Committee Members

Michael Pfrender Laurel Riek Kevin W. Bowyer Jason McLachlan

Degree

Doctor of Philosophy

Degree Level

Doctoral Dissertation

Language

English

Alternate Identifier

etd-04182012-093508

Publisher

University of Notre Dame

Program Name

Computer Science and Engineering

Usage metrics

Keywords

assembly bioinformatics transcriptome

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under permanent embargo

Non-Model Transcriptomics: Applications, Assessments, and Algorithms

History

Date Modified

Defense Date

Research Director(s)

Committee Members

Degree

Degree Level

Language

Alternate Identifier

Publisher

Program Name

Usage metrics

Categories

Keywords

Licence

Exports