Algorithms for Assembly Consolidation and Prediction of Large-Scale Genome Structures
Genome structure is the order and orientation of pieces of DNA comprising a genome, which contains the information of life. With advances in DNA sequencing technology and now massive availability of sequence data, the study of genome structure cannot be easily carried out without efficient and expressly designed algorithms. In this dissertation, we study three genome structure-related problems: structural error correction of draft genome assemblies, inversion prediction, and predicting operons. Our work with draft genome assemblies explores a novel Maximum Alternating Path Cover (MAPC) model to improve genome correctness and downstream analysis. Our work on inversion prediction aims to predict and catalog inversions by exploring the well-known Range Maximum Query model and Max-Cut model for what we call ``global'' inversions, and the novel Rectangle Clustering model and Representative Rectangle Prediction model for more localized inversions. For operon prediction, we again apply the MAPC model (with improved algorithms and theoretical analysis), coupled with a novel Intro-Column Exclusive Clustering model, to predict and catalog operons in closely related species. Evaluated using both simulated and real genome data, our algorithms and implementations have shown substantial promise for accurate computational analysis of genome structure in significantly shorter time.
History
Date Modified
2019-12-11Defense Date
2019-10-07CIP Code
- 40.0501
Research Director(s)
Scott J. Emrich Danny Z. ChenCommittee Members
Taeho Jung Gregory R. Madey Meng JiangDegree
- Doctor of Philosophy
Degree Level
- Doctoral Dissertation
Language
- English
Alternate Identifier
1130230537Library Record
5324741OCLC Number
1130230537Additional Groups
- Computer Science and Engineering
Program Name
- Computer Science and Engineering