18 August 2023
Machine Learning - What Is It Good For?
Machine learning is fashionable. But what is it, and how can it be put to good use in the social sciences? This introductory course provides an overview of some of the most important machine learning techniques and their social science applications. Those applications can be grouped into several sub-categories:
1. Pattern recognition: How do variables hang together, and what groups do our cases form in terms of those variables? For example, political parties take positions on numerous issues. Can we group those issues into ideologies? Based on the issues, can we place the parties into clusters?
2. Preparing data for statistical analysis: Sometimes data are so voluminous that hand-coding them is near-impossible. We can leverage clever computer algorithms to do the coding for us. For example, we could use an artificial neural network to detect if tweets, of which there are millions, come from a social bot or from a legitimate source.
3. Doing statistical analysis: As social scientists we are used to building models with numerous parametric assumptions. What if we would let algorithms leverage the data to obtain the model for us? That way, we may detect complex contingencies not previously theorised.
4. Anomaly detection: Some phenomena, such as war, are fortunately rare. However, this makes analysing them challenging. A whole subfield of machine learning is dedicated to the detection of such rare events or anomalies.
Through lectures and group exercises, the course shows applications in each area. After discussing the general principles of machine learning, the course spends half a day discussing unsupervised machine learning (relevant for area 1), 3.5 days on supervised machine learning techniques (relevant for application areas 2 and 3), and one day on anomaly detection (application area 4). On the last day, students present a machine learning project in groups.
Each day, students will learn the intuition behind the techniques, how they can be implemented in R, how they should be interpreted, and how they can be applied in the social sciences. The course is designed to minimise the level of mathematical complexity, although students can always look up the details in vignettes made available for the course. Classification, as well as regression tasks, are considered. In the former, we seek to predict class membership; in the latter, we predict a numeric score. Interpretation is key, and we spend a great deal of time on various metrics and their implementations.
The course covers the following algorithms/techniques: (1) k-nearest neighbours; (2) probabilistic learning (including naïve Bayes, linear, and quadratic discriminant analysis); (3) classification and regression trees, random forests, and model trees; (4) regression with regularisation and partial least squares; (5) artificial neural network analysis; (6) boosting; (7) cluster analysis; (8) SMOTE; (9) support vector machines; and (10) feature selection.
Marco Steenbergen is a professor of political methodology at the University of Zurich, Switzerland. His methodological interests span choice models, machine learning, measurement, and multilevel analysis.
graduate students, doctoral researchers, early career researchers, experienced researchers
The course assumes a basic familiarity with probability theory and with linear regression analysis. Prior familiarity with machine learning or related fields (e.g., NLP) is not required. On the other hand, a good knowledge of R is essential for the successful completion of the course. Students should know how to read data, how to transform variables, how to work with model objects, and how to create graphs using ggplot.
The Summer School cannot grant credits. We only deliver a Certificate of Participation, i.e. we certify your attendance.
If you consider using Summer School workshops to obtain credits (ECTS), you will have to investigate at your home institution (contact the person/institute responsible for your degree) to find out whether they recognise the Summer School, how many credits can be earned from a workshop/course with roughly 35 hours of teaching, no graded work, and no exams.
Make sure to investigate this matter before registering if this is important to you.
CHF 700: Reduced fee: 700 Swiss Francs per weekly workshop for students (requires proof of student status).
CHF 1100: Normal fee: 1100 Swiss Francs per weekly workshop for all others.