Algorithms for Assembly Consolidation and Prediction of Large-Scale Genome Structures

Doctoral Dissertation

Abstract

Genome structure is the order and orientation of pieces of DNA comprising a genome, which contains the information of life. With advances in DNA sequencing technology and now massive availability of sequence data, the study of genome structure cannot be easily carried out without efficient and expressly designed algorithms. In this dissertation, we study three genome structure-related problems: structural error correction of draft genome assemblies, inversion prediction, and predicting operons. Our work with draft genome assemblies explores a novel Maximum Alternating Path Cover (MAPC) model to improve genome correctness and downstream analysis. Our work on inversion prediction aims to predict and catalog inversions by exploring the well-known Range Maximum Query model and Max-Cut model for what we call “global” inversions, and the novel Rectangle Clustering model and Representative Rectangle Prediction model for more localized inversions. For operon prediction, we again apply the MAPC model (with improved algorithms and theoretical analysis), coupled with a novel Intro-Column Exclusive Clustering model, to predict and catalog operons in closely related species. Evaluated using both simulated and real genome data, our algorithms and implementations have shown substantial promise for accurate computational analysis of genome structure in significantly shorter time.

Attributes

Attribute NameValues
Author Shenglong Zhu
Contributor Taeho Jung, Committee Member
Contributor Gregory R. Madey, Committee Member
Contributor Scott J. Emrich, Research Director
Contributor Danny Z. Chen, Research Director
Contributor Meng Jiang, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name Doctor of Philosophy
Banner Code
  • PHD-CSE

Defense Date
  • 2019-10-07

Submission Date 2019-10-14
Subject
  • Genome assembly improvement

  • Structural error correction

  • Inversion detection

  • Approximation algorithms

  • Operon detection

  • Bioinformatics

  • Algorithms

  • Rectangle clustering

  • Intra-column exclusive clustering

  • Maximum alternating path cover

  • Representative rectangle prediction

Language
  • English

Record Visibility and Access Public
Content License
  • All rights reserved

Departments and Units
Catalog Record

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.