DETECTING DATA IRREGULARITY BY CONSIDERING RESPONSES AND RESPONSE TIME
PDF

Keywords

Aberrant Behaviors
Computerized Test Administration
Item Response Theory
Response Time

Abstract

During a computerized test administration, either online or offline examination, the time that an examinee spends on an item can be easily recorded. This response time information, combined with item responses, could provide more information to detect data irregularities than responses alone. When some examinees answer multiple-choice test items much faster than other examinees, this could be an indication of a data irregularity. It could be occurred for many reasons, including pre-knowledge of the items and rapid guessing if running out of time at the end of a test. These aberrant behaviors, which cannot be detected from paper-based tests, could threaten test security of computerized tests, and invalidate the integrity of test results. Therefore, efforts should be made to detect data irregularities and further investigations may be needed to ensure the test results are as reliable, fair, and valid as possible. In addition, by taking care of data irregularities some researchers have shown better measures of ability (Bolsinova, De Boeck, & Tijmstra, 2017; De Boeck, & Minjeong, 2019; Marianti, Fox, Avetisyan & Veldkamp, 2014; Widiatmo & Wright, 2015).
Several methods can be used for excluding irregularities based on response times and responses (Ratcliff, 2003). One possible method is to apply a threshold method Wise and Kong (2015). This method is called Response Time Effort (RTE) that is the proportion of the items on which each examinee spent sufficient time. The other method is using a statistical model that could detect data irregularities (e.g., Anders, Alario, & Van Maanen, 2016; van der Linden, 2006; van der Maas, Molenaar, Maris, Kievlt, & Borsboom, 2011). Among them, van der Linden's (2006) developed a lognormal model to examine the relationship between item responses and latencies. This method was called the “effective response time” (ERT) in Meijer and Sotaridona (2006). ERT is defined as the time required for an examinee to answer an item correctly, and a chi-square distribution is used to check if the value is beyond a certain confidence level for given examinee ability and item parameters.
The purpose of this study is to investigate whether the RTE method and/or the ERT model can produce “cleaner” data than the current data cleaning method employed. There are three procedures of the data cleaning proposed in this study. The first is only using the RTE method, the second is only the ERT method, and the third is using those two methods together. For the third option, after excluding examinees using the first method, the remaining data are examined using the second method to investigate whether any examinees are needed to be excluded further.
Three sets of data from the three procedures are calibrated into the 3-PL IRT that is the current calibration model used. The results will be compared among the procedures and with the current calibration procedure. How many items are fit to the 3-PL IRT is the criterion measured. It is expected that the more items are fit the model for a given procedure, the more preferable the procedure is.

PDF