Can hyperparameter tuning improve the performance of a super learner? A case study

Abstract

BACKGROUND Super learning is an ensemble machine learning approach used increasingly as an alternative to classical prediction techniques. When implementing super learning, however, not tuning the hyperparameters of the algorithms in it may adversely affect the performance of the super learner. METHODS In this case study, we used data from a Canadian electronic prescribing system to predict when primary care physicians prescribed antidepressants for indications other than depression. The analysis included 73,576 antidepressant prescriptions and 373 candidate predictors. We derived two super learners: one using tuned hyperparameter values for each machine learning algorithm identified through an iterative grid search procedure and the other using the default values. We compared the performance of the tuned super learner to that of the super learner using default values ("untuned") and a carefully constructed logistic regression model from a previous analysis. RESULTS The tuned super learner had a scaled Brier score (R) of 0.322 (95% CI 0.267-0.362). In comparison, the untuned super learner had a scaled Brier score of 0.309 (95% CI 0.256-0.353), corresponding to an efficiency loss of 4% (relative efficiency 0.96; 95% CI 0.93-0.99). The previously-derived logistic regression model had a scaled Brier score of 0.307 (95% CI 0.245-0.360), corresponding to an efficiency loss of 5% relative to the tuned super learner (relative efficiency 0.95; 95% CI 0.88-1.01). CONCLUSIONS In this case study, hyperparameter tuning produced a super learner that performed slightly better than an untuned super learner. Tuning the hyperparameters of individual algorithms in a super learner may help optimize performance.

Publication
Epidemiology