Colchester, United Kingdom

Quantitative Text Analysis and Machine Learning Using R

online course
when 31 July 2023 - 4 August 2023
language English
duration 1 week
credits 4 EC
fee GBP 478

Learn to understand political science literatures using common quantitative text analysis approaches and how to interpret the results with the most common methods used by social scientists.

Need to know

Prior knowledge of R and quantitative methods is required for this class. In the live sessions, you will receive hands-on instruction on performing quantitative text analysis and machine learning using R.

In depth

Thanks to the advancements in personal computing and the internet, humans now produce an overwhelming amount of textual data. While social scientists have traditionally employed quantitative and qualitative content analysis techniques to read and annotate texts manually, the recent breakthroughs in computational approaches to natural language processing have made it possible for researchers to manage and analyse much larger datasets. With these new tools, social scientists can now study a range of topics that were previously inaccessible due to resource constraints, such as organisational ideology, media bias and sentiment, policy change, social media, and information diffusion.

The goal of this course is to equip students with the knowledge and skills to harness computational approaches to analyse text data. By the end of the course, you will:

• understand the conceptual framework and data management techniques necessary for analysing text data as data, including the fundamentals of natural language processing using machine learning;
• be familiar with common computational approaches to text analysis that are widely used in political and social sciences;
• be introduced to advanced methods for managing and analysing text data using embeddings approaches.

Overall, the course aims to provide you with the tools and techniques you need to develop projects that utilise computational approaches to text analysis.

Key topics covered

Day 1: Introduction to Quantitative Text Analysis using R

This session will introduce the theoretical assumptions and key concepts for performing quantitative text analysis, and will showcase several features of the Quanteda package in R. Through examples, the session will illustrate how researchers can use both quantitative and qualitative analysis to gain insights from textual data.

Day 2: Dictionaries and Sentiment Analysis

Keyword searches and dictionary-based approaches offer a straightforward method for analyzing most texts. One practical application is the study of sentiment, as text often contains clues to the writer's tone, emotion, or attitude. In this session, we will provide an overview of commonly used techniques for extracting this information in political contexts.

Day 3: Scaling and Topic Models

Scholars interested in identifying differences between texts based on underlying concepts such as ideology or institutional origin often turn to scaling and topic models. This session will introduce Wordfish and Wordscores, two common scaling models, and provide an overview of approaches for validating their estimates. The session will also cover topic models, which can reveal a set of topics within documents with little prior knowledge of their contents. Specifically, the classical LDA model and the Structural Topic Model will be discussed. Finally, the session will explore various applications of these tools.

Day 4: Introduction to Classification and Supervised Machine Learning

Supervised approaches to text classification involve using existing annotated text to predict the content of uncoded texts. These methods have enabled major advancements in artificial intelligence, and are widely used by organisations like Google and Facebook for analysing massive corpora of text. In addition, these approaches are highly relevant for measuring political concepts. This session will cover the logic of training and evaluating classification models, followed by exploration of specific applications in political science.

Classification based approaches can be used to predict values of interest such as the issues and positions, sentiment or emotions, and other dimensions of text that researchers have traditionally used content analysis to derive.

Day 5: Advanced Topics in Quantitative Text Analysis

This session will cover a set of advanced topics in quantitative text analysis including word embeddings and textual representations, data management, hypothesis testing and data visualisation.

How the course will work online

This course is structured around lectures and labs, with each session comprising a 1.5-hour lecture followed by a 1.5-hour lab. The lectures will provide an overview of relevant concepts and statistical foundations necessary to apply machine learning approaches to textual data, as well as highlight recent applications of these methods in the social sciences. The lab sessions will allow students to gain hands-on experience by applying these methods to political and social science data using common R packages, including the quanteda suite.

It is important to note that this course serves as an introduction to these topics. While you will gain a solid understanding of the subject and practical experience, the course will not cover advanced topics in depth during the one-week session.

Course leader

Zachary Greene is a Reader at the University of Strathclyde. His research interests include quantitative text analysis and machine learning approaches to studying political parties, parliaments and elections.

Target group

Researchers, professional analysts, and advanced students.

Course aim

The abundance of textual data in modern times provides a rich source of information on social and political behavior. As a result, social scientists have increasingly turned to computational or computer-assisted methods to extract insights from this data. This course is designed to equip you with the foundational knowledge and skills required to manage and analyse textual data using R.

Upon completion of the course, you will:

• develop a conceptual understanding of basic approaches to natural language processing;
• gain familiarity with common quantitative text analysis techniques used in political science literature and be able to interpret the results; and
• learn how to implement the most widely used methods employed by social scientists.

The course covers the following topics:

• The Fundamentals of quantitative text analysis using machine learning.
• Estimation of dictionary, sentiment, topic, and scaling models.
• Methods for text classification using supervised machine learning.
• Word embeddings and textual representation models.

Credits info

4 EC
You can earn up to four credits for attending this course.
3 ECTS credits – Attend 100% of live sessions and engage fully with class activities.
4 ECTS credits – Attend 100% of live sessions, engage fully with class activities and complete a post-class assignment.

Fee info

GBP 478: ECPR Member
GBP 956: ECPR Non-Member

Scholarships

Funding applications for the 2023 ECPR Summer School in Research Methods and Techniques are now closed.
For more details on funding opportunities for ECPR's other events, please visit our website.