Bayesian Inference for Growth Mixture Models with an Unknown Number of Classes
dataset
posted on 2024-08-20, 17:21authored byMeng Qiu
Growth mixture models (GMMs) have been widely used to capture different growth trajectories of unobserved subpopulations (or latent classes). The traditional GMM determines the optimal number of classes through a process called class enumeration, which involves fitting a sequence of models with an increasing number of classes and then selecting the best-fitting model using statistical criteria. Despite its popularity, class enumeration has long been criticized for introducing severe subjectivity when comparing the fitted models.
Bayesian nonparametric (BNP) mixture modeling offers an alternative approach to detecting latent classes. The BNP approach circumvents the subjectivity inherent in class enumeration by placing a prior on the mixing distribution, which indirectly induces a prior on the number of classes. Consequently, the number of classes can be inferred directly from the data. However, the BNP approach remains understudied in the context of GMM. To reduce this research gap, the dissertation aims to: 1) propose two BNP-GMMs using the Dirichlet process mixture and the mixture of finite mixtures models; 2) compare the performance of the two proposed models in determining the number of classes $K$ with that of the traditional GMM; and 3) evaluate the performance of the two proposed models in choosing K when using the mode versus when using a loss function called variation of information (VI).
Based on Monte Carlo simulations, Study 1 compares the proposed models and the traditional GMM in choosing K when there is no model misspecification, while Study 2 compares them in choosing K when there is model misspecification in the latent mean structure. Overall, simulation results showed that: 1) the proposed models using VI were more accurate than using the mode; 2) when the population was homogeneous (comprising only one class), the proposed models using VI yielded the highest accuracy in choosing K; whereas, when the population was heterogeneous (consisting of three classes), the proposed models using VI achieved superior accuracy in choosing K when class separation was large; and 3) the proposed models using VI demonstrated robustness against exacerbated overfitting caused by model misspecification. For illustration, the proposed BNP-GMMs were applied to data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99.