University of Notre Dame
Browse
MolikDC052021D.pdf (7.12 MB)

On the Genetic Information Processing of Metabarcodes: Adding Data Utility with Provenance and Metadata

Download (7.12 MB)
thesis
posted on 2021-05-09, 00:00 authored by David C Molik

This body of work explores information processing of metabarcodes. It seeks to answer the following questions: what does this string of metabarcode DNA mean? Where does it come from? And how do we utilize it in new and interesting ways? This is done by implementing a genomic informatics processing framework, utilizing provenance and metadata to increase data utility of metabarcode data. Three terms are critical to the understanding of this body of work: provenance, the history of the data; metadata, information about the data, and data utility, the general reusability of the data. Under these definitions' provenance is a subset of metadata. I show an increase pf data utility in three ways: by looking at the different features of the metabarcode itself, and exploring how manipulation of those features can start to explain variance in the analysis of metabarcode data; In chapter two, agent based simulations are used to analyze features, such as the relative abundance of barcodes, to show their effect on resulting metabarcode datasets. Unsurprisingly, varying the abundance of metabarcode sequences results in the variance in the similarity between samples. Other features, like the addition of Single Nucleotide Polymorphisms can also result in variance in simulaity. I then go on to show how metadata and provenance of previously published metadata can be utilized in order to further describe the environment of a species of interest; In chapter three, I utilize natural language processing techniques in order to draw conclusions about the environment of a particular species, a human pathogen known as Cryptococcus neoformans. Lastly, by utilizing the metadata of various metabarcode datasets I show we can now explore not only the intermixing of various previously published metabarcode data but derive new estimations of arthropod diversity and rarefaction; In chapter four, I implement a novel data framework called met, which utilizes the metadata from different metabarcode datasets in order to make comparisons across different projects. What I conclude is that by utilizing a genomic informatics and informatics processing framework we can increase the data utility of the metabarcode; this is useful because this allows us to gain more “Bang for our Buck” to use an adage.

History

Date Modified

2021-07-12

Defense Date

2021-04-30

CIP Code

  • 26.0102

Research Director(s)

Michael Pfrender

Committee Members

Stuart Jones Scott Emrich Natalie Meyers Elizabeth Archie

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Alternate Identifier

1258685523

Library Record

6046522

OCLC Number

1258685523

Additional Groups

  • Integrated Biomedical Sciences
  • Biological Sciences

Program Name

  • Integrated Biomedical Sciences

Usage metrics

    Dissertations

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC