Search
Browse By Day
Browse By Time
Browse By Person
Browse By Mini-Conference
Browse By Division
Browse By Session or Event Type
Browse Sessions by Fields of Interest
Browse Papers by Fields of Interest
Search Tips
Conference
Location
About APSA
Personal Schedule
Change Preferences / Time Zone
Sign In
X (Twitter)
Language models (LMs) have become a vital component of the political science research workflow. Comparatively minimal work exists for estimating LM uncertainty and incorporating variance in downstream analysis. We show that this leads scholars to misrepresent estimated quantities as known, resulting in attenuation bias and/or loss of efficiency in subsequent regression coefficients. Crucially, the usual fixes for estimating uncertainty, e.g., the repeated independent judgements of coders, are not available. What is more, for many cases it is unclear whether humans (researchers or coders) or LMs are more expert. To calibrate the problem empirically, we compare LM classifications and associated reliability to "gold standard" data for which we have known human inter-coder reliability and confidence. We then provide a framework for incorporating the uncertainty we elicit from LMs in subsequent analysis. Finally, we offer best-practice advice for 'edge cases' where the researcher lacks expertise and is (fully) reliant on the LM judgment.