Individual Submission Summary
Share...

Direct link:

Reconstructing Public Activity on Digital Platforms

Thu, September 5, 2:00 to 3:30pm, Marriott Philadelphia Downtown, 310

Abstract

Digital media companies have increasingly restricted access to data about public activity on their platforms, which limits scholarship, impairs platform accountability, and empowers abusive users. However, nearly all platforms concentrate users in a few high-volume places (i.e. pages, channels, or subreddits). We exploit this concentration to develop new, scalable methods to reconstruct most public user activity on digital platforms, with or without access to platform-provided APIs. Our approach works especially well, we show, because the most popular places/channels are the most stable over time, and because lower-engagement users participate overwhelmingly in popular channels. Platforms also show fractal self-similarity, with subcategories of content mirroring the concentration, stable popularity, and ladder of engagement seen across the platform as a whole.

We deploy these methods in an R package, which can estimate total coverage for a scraping list or API collection list of a given size, and calculate how frequently it needs to be updated to minimize undercollection. Our approach makes it feasible to recover large segments of digital platform activity, both for "big picture" overviews of the highest-visibility content, and within smaller topics and

Authors