Missing Data Methods for Exploratory Dynamic Factor Analysis
New technologies such as smartphones and wearable devices make it more feasible to collect intensive longitudinal data (ILD). ILD allows researchers to study intra-individual differences in finer detail. It also brings about some new challenges. This type of research design involves repeated measures from same individuals, at many time points. Missing data is ubiquitous with such data. Even with new advancements in data collection technologies leading to less burden for participants in longitudinal studies, problems such as equipment malfunctions and limited battery power still make missing data a relevant issue. Missing data can lead to smaller sample size, reduced power, and biased estimates. Missingness in studies with intensive longitudinal data becomes more complex due to dependencies at nearby time points. Therefore, effectively accommodating missing data is essential to analyzing intensive longitudinal data.
Dynamic factor analysis, a procedure combining factor analysis and time series analysis, is a popular data analytic tool for ILD. Previous studies have considered methods for handling missing data for ILD but few of them considered latent variables. Examples include vector autoregressive models or Kalman filters for parameter estimation in dynamic factor analysis models. However, none of them compared popular missing data methods and examined their effects on estimates in exploratory dynamic factor analysis in a variety of conditions. In this dissertation, I propose comparing four methods of handling missing data for exploratory dynamic factor analysis in a simulation study. These methods include pairwise deletion, listwise deletion, cross-sectional multiple imputation, and time series multiple imputation. I conduct a simulation study to examine the implications of these different methods of handing missing data on point estimates. The simulation study varies four features, namely, missing data mechanisms, amount of missing data, time series lengths, and model size. I also illustrate the four missing data methods with two empirical illustrations. The first illustration uses a mood study of daily dairy entries. The second one involves physiological measurements of functional magnetic resonance imaging (fMRI).
The results of the simulation study include that (1) listwise deletion performs poorly in most cases except in some MCAR conditions. (2) Pairwise deletion and the two multiple imputation procedures perform similarly in many cases. (3) Time series multiple imputation had large RMSE and biases in certain cases such as the MCAR condition for some measurement variables. The results of the empirical illustration are comparable with those of the simulation studies. I discuss the implications and limitations of the current study.
History
Date Modified
2021-10-26Defense Date
2021-07-14CIP Code
- 42.2799
Research Director(s)
Guangjian ZhangCommittee Members
Ke-Hai Yuan Zhiyong Zhang Lijuan WangDegree
- Doctor of Philosophy
Degree Level
- Doctoral Dissertation
Alternate Identifier
1280313239Library Record
6135212OCLC Number
1280313239Program Name
- Psychology, Research and Experimental