Identifying Groups with Different Dynamic Patterns Using Cluster Analysis
dataset
posted on 2024-05-10, 20:54authored byDayoung Lee
Clustering is an exploratory analysis technique to uncover subgroups within the population and it facilitates the development of subgroup-specific intervention or treatment. Although it is commonly used for cross-sectional data, several researchers have used it for multivariate time series with the goal of grouping together individuals with similar dynamic patterns (Aghabozorgi, Shirkhorshidi, & Wah, 2015).
The common raw data-based approach may fail to identify subgroups with distinctive dynamic factor structures that are often the focus of psychological research and not directly visible in the time series patterns. In the dissertation, I develop a model-based clustering method to identify subgroups characterized by their dynamic patterns of latent factors. In particular, it first fits a dynamic factor analysis model (Molenaar, 1985; Nesselroade, McArdle, Aggen, & Meyers, 2002; Browne & Zhang, 2007) to each individual's time series, and then calculates distances between parameter estimates of the fitted models and groups individuals based on the distances.
I present an empirical illustration of four clustering methods. The methods are model-based clustering with K-means algorithm, model-based clustering with hierarchical algorithm, raw data-based clustering with K-means algorithm, and raw data-based clustering with hierarchical algorithm. I conduct the simulation study to compare the performances of the four methods with multivariate time series.
The simulation results indicate that (1) the model-based approach had higher cluster recovery rates and adjusted Rand indices compared to the raw data-based approach. (2) Under the same clustering approach, the K-means clustering algorithm had slightly higher cluster recovery rates and adjusted rand indices compared to the hierarchical clustering algorithm in most conditions. (3) The cluster recovery rates of clustering validation indices vary depending on the number of population clusters.
I address methodological and applied implications, limitations and future studies, and conclusions of the study.