Individual Submission Summary
Share...

Direct link:

Applied Machine Learning: Why Simple Models Are Optimal, and Why It Matters

Thu, September 5, 4:00 to 5:30pm, Pennsylvania Convention Center (PCC), 112A

Abstract

Inspired by progress in other fields, political scientists have embraced the use of supervised learning for prediction, inference, measurement and description. In doing so, they often rely on modern, flexible models of considerable complexity that have proved successful in non-social science settings. Yet, as we confirm, there appear to be profound limits to the payoff of such approaches, at least relative to the alternative of using very simple (generalized linear) models for such tasks. We explain why this is, how to identify the problems for which this will be true, and what to do about it. Specifically, we make a theoretical case that social science data is generally highly structured and constrained because of the way that scholars gather observations and variables---especially for tabular data sets. We then use applied probably approximately correct (PAC) learning theory to show that this is true for a large number of real world use cases. Our approach allows us to diagnose when simple models are optimal, and we provide free software for this purpose. Our ultimate recommendation is straightforward: unless their data is unusually high dimensional, unstructured and non-tabular, researchers can generally do no better than to use only simpler models for their tasks. Our claims are all the more true given concerns about "interpretability" and computation but do not rely on those logics themselves.

Authors