posted on 2008-10-20, 00:00authored byAllison A.P. Regier
As the cost of DNA sequencing falls, the relative cost of finishing steps (e.g., error correction and gap-closing) is increasing. As a result, many completed genome projects are only completed to draft stages and may not provide full information about the location of sequences on the chromosome. Further, they may contain gaps and assembly errors. Whether draft or finished, the output of a genome sequence project serves as the input to a host of analysis tools such as gene finding or variation analysis. Many of these tools have been designed for and tested on high-quality, finished genomes such as human or the fruit fly Drosophila melanogaster. In this thesis we discuss specific challenges in working with draft genomes and show how methods can be adapted to be more effective in draft genomes. First, we examine computational methods for finding errors in draft assemblies. Next, we modify a technique for finding DNA inversions between two genomes to account for gaps in the genomes. Finally, we develop a pipeline to construct chromosomes out of draft scaffolds using a closely related reference genome. We use examples from three different species of importance to global health: the body louse (Pediculus humanus), a malaria mosquito (Anopheles gambiae), and the human malaria parasite (Plasmodium falciparum).
History
Date Modified
2017-06-02
Research Director(s)
Kevin W Bowyer
Scott J Emrich
Scott J Emrich
Degree
Master of Science in Computer Science and Engineering