Individual Submission Summary
Share...

Direct link:

Evaluating RAG Systems for Information Retrieval from Political Corpora

Thu, September 5, 8:00 to 9:30am, Marriott Philadelphia Downtown, 405

Abstract

Rapid advances in the capabilities of large language models (LLMs) have revolutionized the field of natural language processing (NLP), and ushered in a new era of research in computational linguistics, machine learning, and artificial intelligence. One burgeoning area of research in NLP is to augment the capabilities of LLMs with information retrieval (IR) systems. This approach, known as Retrieval-Augmented Generation (RAG), combines the strengths of both IR and LLMs to
create a system that can answer questions about a given topic by retrieving relevant documents from a corpus and then generating an answer to the question based on the retrieved documents.

In this paper, I propose a political science application of RAG systems: "Citizen" LLMs, each trained on publicly available text data comprised of debates, bills, and rules, to answer questions about a country's legislative history and the functioning of its legislative system. I evaluate the performance of several RAG systems on a large corpus of legislative text, and show preliminary results that demonstrate the potential of RAG systems to answer questions about politics in natural language. I also explore an alternative approach that composes a larger, smarter LLM and a smaller, domain-specific LLM that has been trained to answer questions about specific political domains. Rather than treating LLMs and databases as separate entities, this approach treats LLMs themselves as a database of knowledge that can be queried by other LLMs. This approach, I argue, can be scaled up to create a system of expert LLMs that are trained to answer questions about specific political domains, and can be queried by a larger, smarter LLM to answer questions about politics in natural language.

Author