To main content To navigation

Economics

Data Scraping and Management for Social Scientists with R

When:

17 June - 21 June 2024

School:

Global School in Empirical Research Methods

Institution:

University of St. Gallen

City:

St. Gallen

Country:

Switzerland

Language:

English

Credits:

4.0 EC

Fee:

1100 CHF

Interested?

Online platforms such as Yelp, Twitter, Amazon, or Instagram are large-scale, rich and relevant sources of data. Researchers in the social sciences increasingly tap into these data for field evidence when studying various phenomena.

In this course, you will learn how to find, acquire, store, and manage data from such sources and prepare them for follow-up statistical analysis for your own research.

After a short introduction into the relevance of data science skills for the social sciences, we will review R as a programming language and its basic data formats. We will then use R to program simple scrapers that systematically extract data from websites. We will use the packages rvest, httr, and RSelenium, among others, for this purpose. You will further need to learn how to read HTML, CSS, JSON, or XML codes, to use regular expressions, and to handle string, text and image data. To store the data, we will look into relational databases, (My)SQL, and related R packages. Many websites such as Twitter and Yelp offer convenient application-programming interfaces (APIs) that facilitate the extraction of data and we will look into accessing them from R. Finally, we will highlight some options for feature extraction from images and text, which allows us to augment our collected data with meaningful variables we can use in our analysis.

At the end of this course, students should be able to identify valuable online data sources, to write basic scrapers, and to prepare the collected data such that they can use them for statistical analysis as part of their own research projects.

Throughout the course, students will work on a data-scraping project related to their theses. This project will be presented at the final day of the course.

Course leader

Reto Hofstetter

Target group

Master | PhD | Postdoc | Professionals

Interested?

When:

17 June - 21 June 2024

School:

Global School in Empirical Research Methods

Institution:

University of St. Gallen

Language:

English

Credits:

4.0 EC

Fee:

1100 CHF, Master | PhD

Fee:

2000 CHF, Postdoc | Professionals

Visit school

Stay up-to-date about our summer schools!

If you don’t want to miss out on new summer school courses, subscribe to our monthly newsletter.