On the Genetic Information Processing of Metabarcodes: Adding Data Utility with Provenance and Metadata
This body of work explores information processing of metabarcodes. It seeks to answer the following questions: what does this string of metabarcode DNA mean? Where does it come from? And how do we utilize it in new and interesting ways? This is done by implementing a genomic informatics processing framework, utilizing provenance and metadata to increase data utility of metabarcode data. Three terms are critical to the understanding of this body of work: provenance, the history of the data; metadata, information about the data, and data utility, the general reusability of the data. Under these definitions' provenance is a subset of metadata. I show an increase pf data utility in three ways: by looking at the different features of the metabarcode itself, and exploring how manipulation of those features can start to explain variance in the analysis of metabarcode data; In chapter two, agent based simulations are used to analyze features, such as the relative abundance of barcodes, to show their effect on resulting metabarcode datasets. Unsurprisingly, varying the abundance of metabarcode sequences results in the variance in the similarity between samples. Other features, like the addition of Single Nucleotide Polymorphisms can also result in variance in simulaity. I then go on to show how metadata and provenance of previously published metadata can be utilized in order to further describe the environment of a species of interest; In chapter three, I utilize natural language processing techniques in order to draw conclusions about the environment of a particular species, a human pathogen known as Cryptococcus neoformans. Lastly, by utilizing the metadata of various metabarcode datasets I show we can now explore not only the intermixing of various previously published metabarcode data but derive new estimations of arthropod diversity and rarefaction; In chapter four, I implement a novel data framework called met, which utilizes the metadata from different metabarcode datasets in order to make comparisons across different projects. What I conclude is that by utilizing a genomic informatics and informatics processing framework we can increase the data utility of the metabarcode; this is useful because this allows us to gain more “Bang for our Buck” to use an adage.
History
Date Modified
2021-07-12Defense Date
2021-04-30CIP Code
- 26.0102
Research Director(s)
Michael PfrenderCommittee Members
Stuart Jones Scott Emrich Natalie Meyers Elizabeth ArchieDegree
- Doctor of Philosophy
Degree Level
- Doctoral Dissertation
Language
- English
Alternate Identifier
1258685523Library Record
6046522OCLC Number
1258685523Additional Groups
- Integrated Biomedical Sciences
- Biological Sciences
Program Name
- Integrated Biomedical Sciences