To main content To navigation

Social Sciences

Web Data Collection with Python and R

When:

09 September - 13 September 2024

School:

GESIS Fall Seminar

Institution:

In cooperation with University of Cologne

City:

Mannheim

Country:

Germany

Language:

English

Credits:

2.0 EC

Fee:

550 EUR

Interested?

The exponential increase in online and social media data offers unprecedented opportunities for advancing research across a variety of fields, both within academia and outside of it. This course provides researchers the tools needed to collect and pre-process large-scale data from a range of online sources. The course will be offered both in R and in Python. Students can attend taught sessions in both programming languages in the morning, and can choose their preferred language for individual/group work and exercises in the afternoon. The content and examples used in the lecturer-led tutorials are similar across programming languages, making it easier for those interested in developing new skills in a secondary language that they may not be proficient in to do so by drawing parallels across the two sessions.

Through a combination of lectures, hands-on tutorials and individual/group exercises, participants will develop a theoretical understanding of the challenges associated with online data collection and the best methods and tools for addressing them in R and in Python, as well as the practical skills needed to collect data through Application Programming Interfaces (APIs), navigate dynamic websites and scrape data from both static and dynamic web pages. The sources used in the examples provided include social media websites, online media outlets and news aggregators, government data portals and other large-scale online data repositories.

Acknowledging that the most difficult part of a computational project involving the collection of complex and heterogenous data is often the pre-processing needed to prepare the data for subsequent analysis and link it across a variety of sources, the course also covers text-based methods for data cleaning and pre-processing. By the end of the week, participants should be able to apply the methods studied to extract and process data for their own research projects.

Course leader

Iulia Cioroianu, University of Bath, England.

Target group

You will find the course useful if:

- you want to learn how to collect and process large amounts of data from online sources fast.
- you aim to improve your existing web scraping skills or have so far encountered difficulties trying to scrape data from online sources.
- you have a research idea for which online data might be suitable, but you are not sure of the practical implications.

Course aim

By the end of the course you will:

- understand the structure and basic features of different forms of online data.
- be able to collect data from static and dynamic websites.
- be able to interact with APIs to access and collect data.
- be able to parse, clean and process the data collected.
- be able to apply the methods studied to their own research projects.

Interested?

When:

09 September - 13 September 2024

School:

GESIS Fall Seminar

Institution:

In cooperation with University of Cologne

Language:

English

Credits:

2.0 EC

Fee:

550 EUR, Student/PhD student rate.

Fee:

825 EUR, Academic/non-profit rate. The rates include the tuition fee, course materials, the academic program, and coffee/tea breaks.

Visit school

Stay up-to-date about our summer schools!

If you don’t want to miss out on new summer school courses, subscribe to our monthly newsletter.