Response times (RTs) on test items are a valuable source of information concerning examinees and the items themselves. As such, they have the potential to improve a wide variety of measurement activities. However, researchers have found that empirical RT distributions can exhibit a variety of shapes among the items within a single test. Though a number of semiparametric and “flexible” parametric models are available, no single model can accommodate all plausible shapes of empirical RT distributions. Thus the goal of this research was to study a few of the potential consequences of RT model misspecification in educational measurement. In particular, two promising applications of RT models were of interest: examinee ability estimation and item selection in computerized adaptive testing (CAT).
First, by jointly modeling RTs and item responses, RTs can be used as collateral information in the estimation of examinee ability. This can be accomplished by embedding separate models for RTs and item responses in Level 1 of a hierarchical model and allowing their parameters to correlate in Level 2. If the RT model is misspecified, a potential drawback of this hierarchical structure is that any negative impact on estimates of the RT model parameters may, in turn, negatively impact ability estimates. However, a simulation study found that estimates of the RT model parameters were robust to misspecification of the RT model. In turn, ability estimates were also robust.
Second, by considering the time intensity of items during item selection in CAT, test completion times can be reduced without sacrificing the precision of ability estimates. This can be done by choosing items that maximize the ratio of item information to the examinee’s predicted RT. However, an RT model is needed to make the prediction; if the RT model is misspecified, this method may not perform as intended. A simulation study found that whether or not the correct RT model was used to make the prediction had no bearing on test completion times. Additionally, using a simple, average RT as the prediction was just as effective as model-based prediction in reducing test completion times.