University of Notre Dame
Browse

File(s) under embargo

Bayesian Inference for Growth Mixture Models with an Unknown Number of Classes

dataset
posted on 2024-08-20, 17:21 authored by Meng Qiu
Growth mixture models (GMMs) have been widely used to capture different growth trajectories of unobserved subpopulations (or latent classes). The traditional GMM determines the optimal number of classes through a process called class enumeration, which involves fitting a sequence of models with an increasing number of classes and then selecting the best-fitting model using statistical criteria. Despite its popularity, class enumeration has long been criticized for introducing severe subjectivity when comparing the fitted models. Bayesian nonparametric (BNP) mixture modeling offers an alternative approach to detecting latent classes. The BNP approach circumvents the subjectivity inherent in class enumeration by placing a prior on the mixing distribution, which indirectly induces a prior on the number of classes. Consequently, the number of classes can be inferred directly from the data. However, the BNP approach remains understudied in the context of GMM. To reduce this research gap, the dissertation aims to: 1) propose two BNP-GMMs using the Dirichlet process mixture and the mixture of finite mixtures models; 2) compare the performance of the two proposed models in determining the number of classes $K$ with that of the traditional GMM; and 3) evaluate the performance of the two proposed models in choosing K when using the mode versus when using a loss function called variation of information (VI). Based on Monte Carlo simulations, Study 1 compares the proposed models and the traditional GMM in choosing K when there is no model misspecification, while Study 2 compares them in choosing K when there is model misspecification in the latent mean structure. Overall, simulation results showed that: 1) the proposed models using VI were more accurate than using the mode; 2) when the population was homogeneous (comprising only one class), the proposed models using VI yielded the highest accuracy in choosing K; whereas, when the population was heterogeneous (consisting of three classes), the proposed models using VI achieved superior accuracy in choosing K when class separation was large; and 3) the proposed models using VI demonstrated robustness against exacerbated overfitting caused by model misspecification. For illustration, the proposed BNP-GMMs were applied to data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99.

History

Date Created

2024-08-15

Date Modified

2024-08-20

Defense Date

2024-06-30

CIP Code

  • 42.2799

Research Director(s)

Ke-Hai Yuan

Committee Members

Lijuan Wang Johnny Zhang Lizhen Lin

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Library Record

6613834

OCLC Number

1452763616

Publisher

University of Notre Dame

Additional Groups

  • Psychology

Program Name

  • Psychology, Research and Experimental

Usage metrics

    Dissertations

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC