APSA Annual Meeting & Exhibition: Making the Implicit Explicit with Generated Natural Language Representations

Social Media Menu
Facebook
X (Twitter)

Back Home

Refresh: Off

Individual Submission Summary

Share...

Direct link:

Making the Implicit Explicit with Generated Natural Language Representations

In Event: New Approaches to Measurement from Text Data Using Generative AI

Thu, September 5, 8:00 to 9:30am, Marriott Philadelphia Downtown, 405

Abstract

In the manual content analysis setting, researchers read and categorize individual text items directly. This human interpretation involves making inferences based on the text to understand what it communicates. Of course, if we hope to make sense of large amounts of text data, then human interpretation of each individual text item is intractable. Topic models are a predominant machine learning technique that help address this problem. Topic models aggregate lexical data into legible categories, a form of automated qualitative content analysis. However, something is lost in this aggregation: corpus-analysis techniques like topic models operate over surface forms alone, but the meaning and import of an utterance are often underdetermined by the utterance itself. This limitation constrains the kinds of inferences such models can support---the import of uttering ``We are the 99\%'' is not fully captured by its component parts.

We introduce a framework for the interpretation of text data at scale by explicitly augmenting observed text with implicitly communicated content. Specifically, we develop a LLM-based method to decompose both the explicit and implicit propositional content of language utterances. Just as a topic model treats a document as multiple underlying topics, we consider that an utterance communicates multiple underlying propositions (which in the above case might include ``Wealth should be redistributed''). Then, using an LLM to generate such inferences, we embed and cluster them. When inspecting these clusters, crowdworkers discover narratives in an opinion corpus that align with those found by a more laborious manual expert process. In a different setting, U.S. Senators are represented using collections of propositions inferred from their tweets, which enables improved models of their voting behavior.

Making the Implicit Explicit with Generated Natural Language Representations

Abstract

Authors