University of Notre Dame
Browse
OlsonM122009.pdf (581.01 kB)

New Methods for Assembly and Validation of Large Genomes

Download (581.01 kB)
thesis
posted on 2009-12-10, 00:00 authored by Michael R. Olson
Recent years have seen an explosion in the amount of genomic data available. However, before any analysis can be performed, an assembly step must be completed that combines the short DNA sequences generated by the sequencing technology into large sequences that more closely represent the DNA as it exists in the cell. This thesis presents a mate-pair based method of validating assemblies and identifying structural variation that relies on already existing draft assemblies. The pipeline is successful in finding structural variation, but less so in improving assembly quality. Additionally, a distributed overlap pipeline is presented that achieves improved runtimes over a typical sequential genome assembler. This pipeline is divided into two parts: a minimizer counter, which reduces memory consumption and allows parallelism at the cost of increased computation, and an aligner, which computes millions of alignments very efficiently in parallel.

History

Date Modified

2017-06-02

Research Director(s)

Scott Emrich

Committee Members

Douglas Thain Greg Madey

Degree

  • Master of Science in Computer Science and Engineering

Degree Level

  • Master's Thesis

Language

  • English

Alternate Identifier

etd-12102009-211352

Publisher

University of Notre Dame

Program Name

  • Computer Science and Engineering

Usage metrics

    Masters Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC