OlsonM122009.pdf (581.01 kB)
New Methods for Assembly and Validation of Large Genomes
thesis
posted on 2009-12-10, 00:00 authored by Michael R. OlsonRecent years have seen an explosion in the amount of genomic data available. However, before any analysis can be performed, an assembly step must be completed that combines the short DNA sequences generated by the sequencing technology into large sequences that more closely represent the DNA as it exists in the cell. This thesis presents a mate-pair based method of validating assemblies and identifying structural variation that relies on already existing draft assemblies. The pipeline is successful in finding structural variation, but less so in improving assembly quality. Additionally, a distributed overlap pipeline is presented that achieves improved runtimes over a typical sequential genome assembler. This pipeline is divided into two parts: a minimizer counter, which reduces memory consumption and allows parallelism at the cost of increased computation, and an aligner, which computes millions of alignments very efficiently in parallel.
History
Date Modified
2017-06-02Research Director(s)
Scott EmrichCommittee Members
Douglas Thain Greg MadeyDegree
- Master of Science in Computer Science and Engineering
Degree Level
- Master's Thesis
Language
- English
Alternate Identifier
etd-12102009-211352Publisher
University of Notre DameProgram Name
- Computer Science and Engineering
Usage metrics
Categories
No categories selectedLicence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC