home / Courses / Social Sciences / Working with Large Online Datasets for Social Science Research (using R)

Social Sciences Summer Course

Working with Large Online Datasets for Social Science Research (using R)

When:

10 August - 14 August 2026

School:

Summer School in Social Sciences Methods

Institution:

Università della Svizzera italiana

City:

Lugano

Country:

Switzerland

Language:

English

Credits:

0 EC

Fee:

800 CHF

Interested?

Working with Large Online Datasets for Social Science Research (using R)

About

Workshop contents and objectives

The rapid growth of digital communication, online social platforms, and publicly available web data sources has created new opportunities for social scientists to study behavior, attitudes, and social phenomena on an unprecedent scale. Online data, whether accessed through APIs, publicly available datasets, social media archives, repositories, or web-scraping tools, can provide unique insights into complex topics, such as wellbeing, risk perception, health communication, consumer behavior, misinformation, and many others.

This course introduces participants to the foundational concepts, tools, and research practices needed to design and conduct social science projects using large online datasets. This course prioritizes practical knowledge with emphasis on research design, ethical considerations, discovering online data sources, and introductory methods for accessing, wrangling, and analyzing rich quantitative and qualitative datasets

By the end of this course, participants will be able to:

Critically evaluate how online data sources can be applied effectively in social science research.
Identify, access, and extract valuable data from various online sources.
Implement data access using APIs and web scraping tools, focusing on ethical, robust, and sustainable data analysis pipelines.
Apply best practices for data wrangling, documentation, and reproducible research workflows.

Workshop design

Each day on the course will consist of two parts:

The morning session is dedicated to foundational concepts and approaches. We'll cover research design, data sources, and ethics, complemented by live demonstrations in R.
The afternoon session will feature guided exercises and individual project work. Every participant will have the opportunity to apply their new knowledge to construct their own research pipeline using large online datasets.
During the course, participants are also welcome to work on their own projects. The instructor will happily assist with any individual project that requires skills and knowledge covered during the course.

Materials (lecture slides, sample datasets, handouts, exercises with solutions, annotated R scripts) will be made openly available via an online repository to all participants.

Detailed lecture plan (daily schedule)

Day 1 – Opportunities and Challenges of Large Online Data in Social Science
Morning:

What is “large online data”? (textual records, metadata, social media, digital trace data, APIs, web archives, open datasets)
Developing research questions with large online datasets
Case studies from political science, psychology, economics, public health
Ethical considerations: digital footprint data, consent, terms of service, reproducibility, anonymity
Afternoon:

Exploring existing online datasets using R
Introduction to common data formats (CSV, JSON, XML)
Exercise: locating and evaluating online datasets for your research question
Day 2 – Obtaining Data Using APIs
Morning:

Understanding APIs
Basic structure of an API request: endpoints, parameters, authentication
Afternoon:

Accessing simple public APIs in R
Working with JSON data
Exercise: retrieving, parsing, and visualising API-based datasets
Day 3 – Working with Large Online Data: Wrangling, Cleaning, and Preparation
Morning:

Typical challenges: duplicates, unstructured text, nested JSON, missing values
Tidy and reproducible data preparation principles for online datasets
Merging text and numeric data in R
Afternoon:

Data wrangling workshop: Developing robust functions to explore and process large online datasets.
Basic text pre-processing pipelines
Mini-challenge exercise: clean and gain insights into from messy dataset about human behaviour!
Day 4 – When APIs Aren’t Enough: Web Scraping
Morning:

When scraping is and when it is not appropriate
HTML/CSS structure, selectors, and the process of web scraping
Polite scraping: rate limiting, data protection, legal considerations
Afternoon:

Using rvest in R for simple scraping tasks
Extracting structured content from static pages
Practical challenge exercise: Crawling through web store data and building a new database
Day 5 – Combining it all Together: Complete Analysis Pipeline with Large Online Data
Morning:

Designing a robust data workflow
Strategies for documentation, transparency, and reproducibility
Introduction to exploratory analysis and visualisation
Afternoon:

Designing own data workflow and project pipeline
Mini project presentations

Class materials

All materials will be provided online.

***The Summer School cannot grant credits. We only deliver a Certificate of Participation, i.e. we certify your attendance.

If you consider using Summer School workshops to obtain credits (ECTS), you will have to investigate at your home institution (contact the person/institute responsible for your degree) to find out whether they recognise the Summer School, how many credits can be earned from a workshop/course with roughly 35 hours of teaching, no graded work, and no exams.

Make sure to investigate this matter before registering if this is important to you.***

Course leader

Dr Lukasz Walasek is an associate professor at the Department of Psychology, University of Warwick, UK.

Target group

graduate students, doctoral researchers, early career researchers, experienced researchers

Prerequisites

The course is suitable for beginners wanting to explore online data in an applied, conceptual, and practical way.

Participants are expected to have basic computer and statistical analysis skills. Basic familiarity with R is necessary to participate in practical exercises and activities.

Fee info

Fee

800 CHF, Reduced fee: 800 Swiss Francs per weekly workshop for students (requires proof of student status). To qualify for the reduced fee, you are required to send a copy of an official document that certifies your current student status or a letter from your supervisor stating your actual position as a doctoral or postdoctoral researcher. Send this letter/document by e-mail to methodssummerschool@usi.ch.

Fee

1200 CHF, Normal fee: 1200 Swiss Francs per weekly workshop for all others.

Interested?

When:

10 August - 14 August 2026

School:

Summer School in Social Sciences Methods

Institution:

Università della Svizzera italiana

Language:

English

Credits:

0 EC

Visit school

Other relevant courses

Deadline: 01 April Top course

Utrecht, Netherlands

Training and Coaching with Constellations

When:

17 August - 21 August 2026

Credits:

2 EC

London, United Kingdom

Maritime and Aviation Law

When:

29 June - 17 July 2026

Credits:

7.5 EC

Linköping, Sweden

The Use (and Abuse) of Culture

When:

26 June - 25 July 2026

Credits:

7.5 EC

Working with Large Online Datasets for Social Science Research (using R)

About

Course leader

Target group

Fee info

Interested?

Other relevant courses

Training and Coaching with Constellations

Maritime and Aviation Law

The Use (and Abuse) of Culture

Stay up-to-date about our summer schools!