Search
Browse By Day
Browse By Time
Browse By Person
Browse By Mini-Conference
Browse By Division
Browse By Session or Event Type
Browse Sessions by Fields of Interest
Browse Papers by Fields of Interest
Search Tips
Conference
Location
About APSA
Personal Schedule
Change Preferences / Time Zone
Sign In
X (Twitter)
How can one trace the spread of information, ideas, and narratives across the world using text data? Social scientists have long sought to answer this question, which requires identifying pairs of documents that contain statements with the same underlying meaning about the same subject. Past approaches that rely on n-gram matching or topic modeling to date have yielded only a loose approximation to this ideal. We propose a method to track the global diffusion of information: first applying a highly scalable method called locality sensitive hashing (LSH) to cross-language embedded representations of text based on a large-language model (LLM) to generate a relatively small number of candidate pairs, then fine-tuning an instruct-trained LLM to identify the actual pairs of sentences that contain the same idea. It is extremely difficult to create a gold-standard labeled data set to evaluate performance for this pairwise problem--we do so by creating data set of thousands of benchmark sentence pairs that contain iterations of equivalent and different statements about the same and different topics. Our method has far higher recall than verbatim text reuse methods and is more precise than topic modeling.
This approach can be applied to the study of propaganda, misinformation, diffusion of innovations. In this paper, we apply the approach to show how U.S. media sources reuse information from Russian state media in the context of the 2022 Russian invasion of Ukraine, for example accusations that Ukraine is developing bioweapons.
Hannah Waight, New York University
Megan Brown, University of Michigan
Jason Greenfield, New York University
Kevin Aslett, University of Central Florida
Margaret E Roberts, University of California, San Diego
Anton Shirikov, University of Kansas
Jonathan Nagler, New York University
Joshua A. Tucker, New York University
Solomon Messing, New York University