Lugano, Switzerland

Web Scraping and Data Mining with R

when 12 August 2024 - 16 August 2024
language English
duration 1 week
fee CHF 700

Workshop contents and objectives

The widespread accessibility of the internet and the ongoing digitisation of information has transformed how people communicate and share knowledge. These changes have provided new possibilities for social scientists who can now use large sources of publicly available online data to gain new insights about individuals and their social environment. Creative uses of unsolicited online data can provide a unique window into people’s behaviours, attitudes, and beliefs in the context of significant social, political, and economic realities. Such “big data” approaches have led to significant advances in our understanding of the dynamics of political ideology, risk attitudes, health, wellbeing, misinformation, consumer behaviour, and sustainability, among many others.

The aim of this course is to introduce the core concepts and methodologies for web scraping and data mining approaches in R. During the sessions, participants will learn about the sources and types of online data that can be accessed by social scientists. They will also develop new skills in how such data can be scraped/harvested/extracted. In a series of practical exercises and activities, participants will learn about managing their own big data science projects and solving challenges associated with ethical considerations, data wrangling and formatting, web crawling, and data exploration through visualisation. During the course, participants will apply their new skills as they embark on their first data mining project under the supervision of the course lead. On the completion of the course, participants will be equipped with the necessary skills to identify, extract, process, and visualise large volumes of online data.



Workshop design

Each class is divided into two main parts. Each day will begin with a class covering one of the core web scraping and/or data mining topics. During this part, the instructor will discuss key concepts of web scraping, and demonstrate how to apply them in R environment. Afternoons (after lunch) will be devoted to practical exercises and activities, which will require participants to apply a range of data mining techniques to collect, process, and visualise online data. During this part, participants will work together and with the instructor to progress through various exercises on web scraping and data mining.

Participants are welcome to use any remaining time to work on their own data mining projects. The instructor will happily assist with any individual project. Materials (lecture slides, sample datasets, handouts, exercises with solutions, annotated R scripts) will be made openly available via an online repository to all participants.



Detailed lecture plan (daily schedule)

Day 1.
Morning: Fundamental aspects of web scraping and data mining; Theory driven research using large volumes of online data; Basics of HTML and CSS.
Afternoon: Applying CSS selectors using Rvest to extract web content.

Day 2.
Morning: Overview of the HTTP protocol; Setting up web connections using R; Ethics and legalities of web scraping.
Afternoon: Building first online scraper.

Day 3.
Morning: Principles for building robust online crawlers; Handling errors.
Afternoon: Designing, implementing, and testing a data scraping crawler

Day 4.
Morning: Introduction to APIs and .Json file format.
Afternoon: Extracting data from REST APIs

Day 5.
Morning: Understanding different types of API authentication.
Afternoon: Establishing OAuth connection with an API.



Class materials

All materials will be provided online.



Prerequisites

Participants are expected to have basic computer and statistical analysis skills. Good knowledge of R is necessary to participate in practical exercises and activities. The course will introduce participants to the basics of html, css, and JavaScript, so no prior knowledge of these topics is necessary.

Course leader

Dr Lukasz Walasek is an associate professor at the Department of Psychology, University of Warwick, UK.

Target group

graduate students, doctoral researchers, early career researchers, experienced researchers

Credits info

The Summer School cannot grant credits. We only deliver a Certificate of Participation, i.e. we certify your attendance.

If you consider using Summer School workshops to obtain credits (ECTS), you will have to investigate at your home institution (contact the person/institute responsible for your degree) to find out whether they recognise the Summer School, how many credits can be earned from a workshop/course with roughly 35 hours of teaching, no graded work, and no exams.

Make sure to investigate this matter before registering if this is important to you

Fee info

CHF 700: Reduced fee: 700 Swiss Francs per weekly workshop for students (requires proof of student status).*

Reduced Fee

To qualify for the reduced fee, you are required to send a copy of an official document that certifies your current student status or a letter from your supervisor stating your actual position as a doctoral or postdoctoral researcher. Send this letter/document by e-mail to methodssummerschool@usi.ch.

*These fees also include participation in one of the preliminary workshops (a 2/3-day workshop preceding the Summer School). The registration fee for the Preliminary workshop booked on its own is 200 CHF.
CHF 1100: Normal fee: 1100 Swiss Francs per weekly workshop for all others.*


*These fees also include participation in one of the preliminary workshops (a 2/3-day workshop preceding the Summer School). The registration fee for the Preliminary workshop booked on its own is 200 CHF.