Individual Submission Summary
Share...

Direct link:

You Should (Probably) Not Split Your Sample

Fri, September 6, 2:00 to 3:30pm, Pennsylvania Convention Center (PCC), 111A

Abstract

In a survey of top political science journals, we identify a widespread practice to test for heterogeneous treatment effects: split the sample based on a moderator, estimate separate models for each subsample, and compare the heterogeneous effects of the moderated variable on the outcome of interest. We provide conceptual and mathematical evidence that this practice does not allow for correct comparisons across subsamples, as intended. In simulations, we show that splitting the sample is analogous to interacting all variables - not just the moderated variable - with the moderator, a practice that rests on entirely different assumptions about statistical efficiency. We propose two solutions. If the goal is to isolate the effect of the moderated variable conditional on the moderator, researchers should interact the moderator with the moderated variable. If the goal is to estimate variation across groups, researchers should estimate hierarchical models. As a final step, we replicate several prominent articles to show that their empirical conclusions would change significantly if they modeled heterogeneous treatment effects correctly, namely, using interactions or hierarchical models rather than splitting the sample. Our findings highlight the potential consequences of incorrect comparisons in political science research while also providing two straightforward solutions to these issues.

Authors