Individual Submission Summary
Share...

Direct link:

Producing a Social Media-Based Arabic Sentiment Lexicon: Methodology & Data

Sat, September 7, 8:00 to 9:30am, Marriott Philadelphia Downtown, Franklin 5

Abstract

With nearly 3.5 billion Facebook and X (formerly Twitter) users around the globe, the world is producing text data at a historically unprecedented scale. Moreover, social media is exceedingly becoming an arena for individuals to express their opinions, share knowledge and post various material. In the Arab World, there are around 100 million social media users that use this medium to express political views given the authoritarian offline context. This paper introduces a sentiment lexicon that we produced to analyze the sentiments of colloquial Arabic language, as posted on social media platforms, using the Egyptian dialect. To produce the lexicon, we first scraped data from Facebook and X. After data cleaning and tokenization, we ended with nearly 10,000 unique words. For each word, coders performed two annotation tasks: A polarity annotation task where they assigned a cumulative score of 10 points for each word across three categories: positive, negative, and neutral. This was followed by a sentiment annotation task where each word was coded along six basic human emotions sentiments: Happiness, Sadness, Fear, Trust, Disgust, and Anger. For validation, an external data set with 4000 annotated sentences was used. The validation revealed high accuracy scores across different sentiment categories. As per our best knowledge, this lexicon is the first that provides sentiment analysis for the colloquial Egyptian dialect. Compared to other Arabic sentiment lexicons, it is also the first that exclusively relied on graduate students as sentiment coders.

Authors