Evaluation of a Two-Stage Statistical Learning Design for Genome-Wide Studies

Walters, Raymond Kenney

doi:10.7274/p2676t07f3d

File(s) under permanent embargo

Evaluation of a Two-Stage Statistical Learning Design for Genome-Wide Studies

thesis

posted on 2013-04-16, 00:00 authored by Raymond Kenney Walters

Twin and family studies show that many common traits and disorders are highly heritable, but genome-wide association studies (GWAS) have been largely unable to identify specific single nucleotide polymorphisms (SNPs) explaining this heritability at the genetic level. Recent work suggests statistical learning methods like gradient boosting (GBM) may be a viable alternative to conventional methods, especially after adjustments for the structure of SNP data. The current research evaluates a two-stage research design for GWAS. GBM is used as a first stage variable selection screen to substantially reduce the dimensionality of SNP data while maintaining sensitivity to additive, nonlinear, and interaction effects, allowing hypothesis testing with a reduced multiple testing burden in the second stage analysis. Thorough simulations shows the proposed two-stage design can substantially improve power to detect effect SNPs in a wide range of conditions. The limitations and potential improvements to this design are explored.

History

Date Modified

2017-06-02

Research Director(s)

Gitta Lubke

Committee Members

Scott Maxwell Jiahan Li

Degree

Master of Arts

Degree Level

Master's Thesis

Language

English

Alternate Identifier

etd-04162013-072727

Publisher

University of Notre Dame

Program Name

Psychology

Usage metrics

Keywords

data mining machine learning statistical genetics regression trees behavioral genetics variable importance statistical power

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under permanent embargo

Evaluation of a Two-Stage Statistical Learning Design for Genome-Wide Studies

History

Date Modified

Research Director(s)

Committee Members

Degree

Degree Level

Language

Alternate Identifier

Publisher

Program Name

Usage metrics

Categories

Keywords

Licence

Exports