University of Notre Dame
Browse
ZhuS102019D.pdf (1.33 MB)

Algorithms for Assembly Consolidation and Prediction of Large-Scale Genome Structures

Download (1.33 MB)
thesis
posted on 2019-10-14, 00:00 authored by Shenglong Zhu

Genome structure is the order and orientation of pieces of DNA comprising a genome, which contains the information of life. With advances in DNA sequencing technology and now massive availability of sequence data, the study of genome structure cannot be easily carried out without efficient and expressly designed algorithms. In this dissertation, we study three genome structure-related problems: structural error correction of draft genome assemblies, inversion prediction, and predicting operons. Our work with draft genome assemblies explores a novel Maximum Alternating Path Cover (MAPC) model to improve genome correctness and downstream analysis. Our work on inversion prediction aims to predict and catalog inversions by exploring the well-known Range Maximum Query model and Max-Cut model for what we call ``global'' inversions, and the novel Rectangle Clustering model and Representative Rectangle Prediction model for more localized inversions. For operon prediction, we again apply the MAPC model (with improved algorithms and theoretical analysis), coupled with a novel Intro-Column Exclusive Clustering model, to predict and catalog operons in closely related species. Evaluated using both simulated and real genome data, our algorithms and implementations have shown substantial promise for accurate computational analysis of genome structure in significantly shorter time.

History

Date Modified

2019-12-11

Defense Date

2019-10-07

CIP Code

  • 40.0501

Research Director(s)

Scott J. Emrich

Committee Members

Taeho Jung Gregory R. Madey Meng Jiang

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Alternate Identifier

1130230537

Library Record

5324741

OCLC Number

1130230537

Program Name

  • Computer Science and Engineering