MOOC Dropout Prediction: How to Measure Accuracy?
Jacob Whitehill, Kiran Mohan, Daniel Seaton, Yigal Rosen, and Dustin Tingley
In order to obtain reliable accuracy estimates for automatic MOOC dropout predictors, it is important to train and test them in a manner consistent with how they will be used in practice. Yet most prior research on MOOC dropout prediction has measured test accuracy on the same course used for training, which can lead to overly optimistic accuracy estimates. In order to understand better how accuracy is affected by the training+testing regime, we compared the accuracy of a standard dropout prediction architecture (clickstream features + logistic regression) across 4 different training paradigms. Results suggest that (1) training and testing on the same course ("post-hoc") can significantly overestimate accuracy. Moreover, (2) training dropout classifiers using proxy labels based on students’ persistence – which are available before a MOOC finishes – is surprisingly competitive with post-hoc training (87.33% v. 90.20% AUC averaged over 8 weeks of 40 HarvardX MOOCs) and can support real-time MOOC interventions.