Statistical Approaches for Evaluating Predictive Model Performance using MOOCs

Statistical Approaches for Evaluating Predictive Model Performance using MOOCs

Josh Gardner and Christopher Brooks

Feature extraction and model selection are two essential processes when building predictive models of student success. In this work we describe and demonstrate a statistical approach to both tasks, comparing five modeling techniques (a lasso penalized logistic regression model, na\"{\i}ve Bayes, random forest, SVM, and classification tree) across three sets of features (week-only, summed, and appended. We conduct this comparison on a dataset compiled from 30 total offerings of five different MOOCs run on the Coursera platform. Through the use of the Friedman test with a corresponding post-hoc Nemenyi test, we present comparative performance results for several classifiers across the three different feature extraction methods, demonstrating a rigorous inferential process intended to guide future analyses of student success systems.