Evaluation of a Two-Stage Statistical Learning Design for Genome-Wide Studies

Master's Thesis
Thumbnail

Abstract

Twin and family studies show that many common traits and disorders are highly heritable, but genome-wide association studies (GWAS) have been largely unable to identify specific single nucleotide polymorphisms (SNPs) explaining this heritability at the genetic level. Recent work suggests statistical learning methods like gradient boosting (GBM) may be a viable alternative to conventional methods, especially after adjustments for the structure of SNP data. The current research evaluates a two-stage research design for GWAS. GBM is used as a first stage variable selection screen to substantially reduce the dimensionality of SNP data while maintaining sensitivity to additive, nonlinear, and interaction effects, allowing hypothesis testing with a reduced multiple testing burden in the second stage analysis. Thorough simulations shows the proposed two-stage design can substantially improve power to detect effect SNPs in a wide range of conditions. The limitations and potential improvements to this design are explored.

Attributes

Attribute NameValues
URN
  • etd-04162013-072727

Author Raymond Kenney Walters
Advisor Gitta Lubke
Contributor Scott Maxwell, Committee Member
Contributor Jiahan Li, Committee Member
Contributor Gitta Lubke, Committee Chair
Degree Level Master's Thesis
Degree Discipline Psychology
Degree Name MA
Defense Date
  • 2013-03-19

Submission Date 2013-04-16
Country
  • United States of America

Subject
  • data mining

  • machine learning

  • statistical genetics

  • regression trees

  • behavioral genetics

  • variable importance

  • statistical power

Publisher
  • University of Notre Dame

Language
  • English

Record Visibility and Access Public
Content License
  • All rights reserved

Departments and Units

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.