Search
Browse By Day
Browse By Time
Browse By Person
Browse By Mini-Conference
Browse By Division
Browse By Session or Event Type
Browse Sessions by Fields of Interest
Browse Papers by Fields of Interest
Search Tips
Conference
Location
About APSA
Personal Schedule
Change Preferences / Time Zone
Sign In
X (Twitter)
Researchers frequently need to extract information, such as events or target topics, from large corpora. One common solution involves applying semantically-related keywords to identify tweets, news articles, or other documents of interest. However, it is rarely the case that dictionaries of relevance to the topic, event, or language both exist and are accessible. Moreover, existing algorithms for extracting dictionaries, require many user-provided seed words or hand-coded documents to generate useful results and do not incorporate contextual information from natural language. In this paper, I present a novel algorithm, conclust, that extracts keywords from unlabeled text using a small number of user-provided seed words and a fitted word embeddings model. Compared to existing methods of lexicon extraction, conclust requires few seed words, is computationally efficient, and takes word context into account. I describe this algorithm's properties and benchmark its performance with existing methods of lexical dictionary extraction, comparing differences in user labor, conceptual clarity, and the ability to replicate existing keyword dictionaries.