Search
Browse By Day
Browse By Time Slot
Browse By Person
Browse By Division
Browse By Session Type
Search Tips
How to Build a Personal Program
Conference Home Page
Personal Schedule
Change Preferences / Time Zone
Hasidic stories offer an unparalleled window into the daily lives, social networks, and cultural milieu of Jews in 19th-20th century Eastern Europe. As a literary genre, it is full of various characters, such as leaders (Tsadikim) and lay people (Hasidim), Jews and non-Jews, men and women, identified and anonymous persons, ordinary and mythical, to list but a few of the roles. Once transformed into structured data, this diverse cast holds immense potential for novel insights into historical and occupational trends, family structures, and intersecting identities, to mention but a few.
Under a project funded by the Israel Science Foundation (ISF), we are building an annotated corpus of Hasidic stories, which appeared in editions published in eastern European printing houses between 1814 and 1914. In addition to tagging motifs and themes in the stories and recognizing places mentioned throughout them, we manually annotated people in both the paratext and the texts of the stories. An additional grant awarded to us by the Open University of Israel enables us to embark on an exploratory project to try the automation of annotation through both natural language processing (NLP) tools and large language models (LLM).
This paper examines the key challenges and opportunities surrounding people annotation in such a multilayered, culturally-embedded textual tradition. For example, how should we differentiate between historical and literary characters? How should we annotate and extract unnamed literary characters that are characterized only by a specific detail of their lives (such as their place of origin or their role in the family)? How should we integrate the characters in the stories with characters in the metadata and paratexts (the printers, publishers, etc.)?
We will share sample annotations highlighting these issues, describe our iterative manual and automated workflows, and discuss evaluation against human-annotated gold standards. Our learnings contribute to larger debates on person/entity annotation for literary corpora and digital cultural heritage preservation.
By transforming these Hasidic storytelling traditions into FAIR data, we enable novel computational and humanistic research not just for Hasidic scholars, but for digital humanities, sociohistorical analysis, literary studies, and minority culture preservation worldwide.