Search
Browse By Day
Browse By Time
Browse By Person
Browse By Mini-Conference
Browse By Division
Browse By Session or Event Type
Browse Sessions by Fields of Interest
Browse Papers by Fields of Interest
Search Tips
Conference
Location
About APSA
Personal Schedule
Change Preferences / Time Zone
Sign In
X (Twitter)
In the manual content analysis setting, researchers read and categorize individual text items directly. This human interpretation involves making inferences based on the text to understand what it communicates. Of course, if we hope to make sense of large amounts of text data, then human interpretation of each individual text item is intractable. Topic models are a predominant machine learning technique that help address this problem. Topic models aggregate lexical data into legible categories, a form of automated qualitative content analysis. However, something is lost in this aggregation: corpus-analysis techniques like topic models operate over surface forms alone, but the meaning and import of an utterance are often underdetermined by the utterance itself. This limitation constrains the kinds of inferences such models can support---the import of uttering ``We are the 99\%'' is not fully captured by its component parts.
We introduce a framework for the interpretation of text data at scale by explicitly augmenting observed text with implicitly communicated content. Specifically, we develop a LLM-based method to decompose both the explicit and implicit propositional content of language utterances. Just as a topic model treats a document as multiple underlying topics, we consider that an utterance communicates multiple underlying propositions (which in the above case might include ``Wealth should be redistributed''). Then, using an LLM to generate such inferences, we embed and cluster them. When inspecting these clusters, crowdworkers discover narratives in an opinion corpus that align with those found by a more laborious manual expert process. In a different setting, U.S. Senators are represented using collections of propositions inferred from their tweets, which enables improved models of their voting behavior.