Individual Submission Summary
Share...

Direct link:

How Many Is Enough? Sample Size in Staggered Difference-in-Differences Designs

Fri, September 6, 10:00 to 11:30am, Pennsylvania Convention Center (PCC), 110B

Abstract

In difference-in-differences designs with staggered treatment timing and dynamic treatment effects, the two-way fixed effects estimator fails to recover an interpretable causal estimate. A large number of estimators have been proposed to remedy this issue. The flexibility of these estimators, however, increases their variance. This can lead to statistical tests with low statistical power. As a consequence, small effects are unlikely to be discovered. Additionally, under low power, if a statistically significant estimate is recovered, the estimate is often wrongly signed and/or greatly exaggerated. Using simulations on real-world data on the US States, we show that even with large effect sizes, none of the recently developed estimators for staggered difference-in-differences produce statistical tests that achieve 80% power. Further, conditional on statistical significance, when the intervention generates weak effects, all estimators recover the wrong sign in approximately 10% of the simulations and overestimate the true effect by several hundred percent on average. We use data on publicly traded
firms to investigate which sample size is needed for a staggered difference-in-differences analysis to be informative. We find that with a very large effect size of 10%, even the most efficient estimators need 1,000 units to achieve reasonable power. We conclude with a discussion of how this type of ‘design analysis’ ought to be used by researchers before estimating staggered difference-in-differences models. We also discuss how power may under certain conditions be improved if a study is re-designed, e.g., by examining county-level outcomes with state-level interventions.

Authors