Search
Browse By Day
Browse By Time
Browse By Person
Browse By Mini-Conference
Browse By Division
Browse By Session or Event Type
Browse Sessions by Fields of Interest
Browse Papers by Fields of Interest
Search Tips
Conference
Location
About APSA
Personal Schedule
Change Preferences / Time Zone
Sign In
X (Twitter)
Inspired by progress in other fields, political scientists have embraced the use of supervised learning for prediction, inference, measurement and description. In doing so, they often rely on modern, flexible models of considerable complexity that have proved successful in non-social science settings. Yet, as we confirm, there appear to be profound limits to the payoff of such approaches, at least relative to the alternative of using very simple (generalized linear) models for such tasks. We explain why this is, how to identify the problems for which this will be true, and what to do about it. Specifically, we make a theoretical case that social science data is generally highly structured and constrained because of the way that scholars gather observations and variables---especially for tabular data sets. We then use applied probably approximately correct (PAC) learning theory to show that this is true for a large number of real world use cases. Our approach allows us to diagnose when simple models are optimal, and we provide free software for this purpose. Our ultimate recommendation is straightforward: unless their data is unusually high dimensional, unstructured and non-tabular, researchers can generally do no better than to use only simpler models for their tasks. Our claims are all the more true given concerns about "interpretability" and computation but do not rely on those logics themselves.