Search
Browse By Day
Browse By Time
Browse By Person
Browse By Mini-Conference
Browse By Division
Browse By Session or Event Type
Browse Sessions by Fields of Interest
Browse Papers by Fields of Interest
Search Tips
Conference
Location
About APSA
Personal Schedule
Change Preferences / Time Zone
Sign In
X (Twitter)
When merging data from different sources, social scientists often rely on fuzzy string matching to determine whether two records refer to the same entity. But for many social science applications, the most commonly used string distance metrics are imperfect, because they capture lexical similarity rather than similarity of meaning (e.g. Jim is a better match with James than with Tim, USN is a better match with Navy than with USPS). Pre-trained text embeddings, by contrast, are a fast and scalable method for determining whether two strings have similar meaning. In this paper, I show that incorporating these measures into a probabilistic record linkage procedure yields considerable gains in accuracy and efficiency. Across three applications, I show that these performance gains can be achieved with only minimal alterations to existing record linkage workflows, and provide open-source statistical software for researchers to implement the proposed method.