3 February 2022
Missing Data in R
Missing data are ubiquitous in nearly every data analytic enterprise. Simple ad-hoc techniques for dealing with missing values such as deleting incomplete cases or replacing missing values with the item mean can cause a host of (hidden) problems. In this workshop, we will discuss principled methods for treating missing data and how to apply these methods in R. We will cover basic missing data theory, methods for exploring/quantifying the extent of the missing data problem and two principled methods for correcting the missing data: multiple imputation and full information maximum likelihood. Students will practice what they learn via practical exercises.
In the morning/early afternoon, new content will be presented via interactive lectures. In the afternoon, the students will practice what they learned via practical exercises. If the schedule permits, the students are also welcome to ask the instructor for advice on their own data analyses.
We will not cover basic R usage. Students should already know how to use R to read and write data, do basic data manipulations, run R functions, and work with the results returned by R functions.
Participants should bring their own laptop computer with both R and RStudio installed.
Professionals who seek a master-level introduction to missing data analysis.
Please note that there are no graded activities included in this course. Therefore, we are not able to provide students with a transcript of grades. You will obtain a certificate upon completion of this course.
For an overview of all our summer school courses offered by the Department of Methodology and Statistics please click here.
After completing this course, students can:
Describe the most important characteristics of a missing data problem and choose appropriate statistics, metrics, or visualisations to quantify/illustrate those characteristics.
Describe the three missing data mechanisms and their effects on data analyses.
Describe the fraction of missing information, how it is interpreted, and why it is important.
Describe the strengths and weaknesses of traditional, ad-hoc missing data treatments.
Describe multiple imputation (MI): what it is, why it works, and why it is superior to traditional, ad-hoc techniques.
Describe the steps in an MI-based analysis.
Describe full information maximum likelihood (FIML): what it is, why it works, and why it is superior to traditional ad-hoc techniques.
Compare and contrast the relative strengths and weaknesses of MI and FIML.
Write basic R scripts to do the following:
Explore a missing data problem with appropriate statistics, metrics, and visualisations.
Conduct an MI-based analysis.
Conduct a FIML-based analysis.
EUR 150: Course + course materials + lunch