Cologne, Germany

Data Science Techniques for Survey Researchers

when 12 August 2024 - 16 August 2024
language English
duration 1 week
credits 4 EC
fee EUR 550

A variety of digital data sources are providing new avenues for empirical social science research. In order to effectively utilize these data for answering substantive research questions, a modern methodological toolkit paired with a critical perspective on data quality is needed. This course introduces state-of-the-art data science techniques that are suited for collecting and analyzing digital behavioral data, so-called "big data", and traditional survey data. In addition, aspects of data quality and error frameworks for digital (big) data sources are discussed.

The course will cover the following topics and techniques:
- Overview of Big Data: What is it and why does it matter?
- Total Survey Error for Big Data
- Web Scraping
- Machine Learning for Social Scientists
- Regularized regression
- Tree-based methods
- Support vector machines
- LLMs

After the course, you will have a profound understanding of important methods from the data science toolkit for collecting and analyzing the data types mentioned. You will be able to apply these methods and techniques in your research using statistical software.

Course leader

Anna-Carolina Haensch is an assistant research professor at the University of Maryland (UMD) and a Senior researcher at the LMU Munich.

Target group

You will find the course useful if:
- you are interested in learning some fundamental techniques in data science,
- you want to collect and work with digital behavioral data, be it administrative data or data found online,
- you want to understand what machine learning is.


Prerequisites
- General knowledge of statistics and statistical modelling (i.e., regression)
- Prior experiences with syntax-based software (like R, Stata, or Python)

Some basic experience with programming in R is very helpful, but not strictly necessary. For those without prior exposure to R, we will ensure everyone is able to execute R markdown files. If you have no previous R knowledge, we encourage you to work through one or more R tutorials prior of the course. Some resources can be found here:
https://rstudio.cloud/learn/primers
http://www.statmethods.net/
https://swirlstats.com/
https://rmarkdown.rstudio.com/lesson-1.html (for R Markdown)

Course aim

By the end of the course, you will:
- understand the challenges when analyzing digital behavioral data,
- know the promises and benefits of (supervised) machine learning,
- be able to use (supervised) machine learning for data analysis,
- learn some of the metrics used to assess data quality for gathered data types.

Credits info

4 EC
- Certificate of attendance issued upon completion.

Optional bookings:
The University of Mannheim acknowledges the workload for regular attendance, satisfactory work on daily assignments and for submitting a paper of 5000 words to the lecturer(s) by 15 October at the latest with 4 ECTS (70 EUR administration fee).

Fee info

EUR 550: Student/PhD student rate.
EUR 825: Academic/non-profit rate.
The rates include the tuition fee, course materials, the academic program, and coffee/tea breaks.

Scholarships

Scholarships are available from the European Survey Research Association (ESRA), see more information on our website