CNODES | Can we train machine learning methods to outperform the high-dimensional propensity score algorithm?

28 Aug Can we train machine learning methods to outperform the high-dimensional propensity score algorithm?

Posted at 12:51h in by CNODES Admin

Can we train machine learning methods to outperform the high-dimensional propensity score algorithm?

Overview

Description

The use of retrospective health care claims datasets is frequently criticized for the lack of complete information on potential confounders. Utilizing patient’s health status-related information from claims datasets as surrogates or proxies for mismeasured and unobserved confounders, the high-dimensional propensity score algorithm enables us to reduce bias. Using a previously published cohort study of postmyocardial infarction statin use (1998-2012), we compare the performance of the algorithm with a number of popular machine learning approaches for confounder selection in high-dimensional covariate spaces: random forest, least absolute shrinkage and selection operator, and elastic net. Our results suggest that, when the data analysis is done with epidemiologic principles in mind, machine learning methods perform as well as the high-dimensional propensity score algorithm. Using a plasmode framework that mimicked the empirical data, we also showed that a hybrid of machine learning and high-dimensional propensity score algorithms generally perform slightly better than both in terms of mean squared error, when a bias-based analysis is used.

Manuscripts

Karim ME, Pang M, Platt RW. Can we train machine learning methods to outperform the high-dimensional propensity score algorithm? Epidemiology. 2018 Mar;29(2):191-198.

Publication Link

PDF

Presentations

Project Team

Project Lead

Robert W. Platt PhD

CPRD

Collaborator

Mohammad Ehsanul Karim PhD

CPRD

Collaborator

Menglan Pang MSc

CPRD

Back