7 August 2021
Data Science: Foundations of Data Analytics
The most important aspect of computer science is problem solving, an essential skill for life.
Data Science is concerned with how to gain knowledge from the vast volumes of data generated daily in modern life, from social networks to scientific research and finance, and proposes sophisticated computing techniques for processing this deluge of information. This courses addresses fundamental aspects of Data Science, e.g. analytical models to represent and understand the data, efficient algorithms to manipulate and extract relevant knowledge.
In particular, students study the design, development and analysis of software and hardware used to solve problems in a variety of business, scientific and social contexts. During this course, students will study techniques for how to go from raw data to a deeper understanding of the patterns and structures within the data, to support making predictions and decision making. Students would be expected to have some basic knowledge of linear algebra and calculus.
Data Analytics involves being about to go from raw data to a deeper understanding of the patterns and structures within the data, to support making predictions and decision making. The course will cover a number of topics, including:
- Introduction to analytics, case studies - How analytics is used in practice. Examples of successful analytics work from companies such as Google, Facebook, Kaggle, and Netflix.
- Basic tools: command line tools, plotting tools, programming tools - The wide variety of tools available to work with data, including unix/linux command line tools for data manipulation (sorting, counting, reformatting, aggregating, joining); tools such as gnuplot for displaying and visualizing data.
- Statistics: Probability recap, distributions, significance tests, R - The tools from statistics for understanding distributions and probability (means, variance, tail bounds). Hypothesis testing for determining the significance of an observation, and the R system for working with statistical data.
- Regression: linear regression, least squares, logistic regression - Predicting new data values via regression models. - - Simple linear regression over low dimensional data, regression for higher dimensional data via least squares optimization, logistic regression for categoric data.
- Matrices: Linear Algebra, SVD, PCA - Matrices to represent relations between data, and necessary linear algebraic operations on matrices. Approximately representing matrices by decompositions (Singular Value Decomposition and - Principal Components Analysis). Application to the Netflix prize.
- Clustering: hierarchical, k-means, k-center - Finding clusters in data via different approaches. Choosing distance metrics. Different clustering approaches: hierarchical agglomerative clustering, k-means (Lloyd's algorithm), k-center approximations. Relative merits of each method.
- Classification: Trees, NB, Support Vector Machines, Kernel Trick - Building models to classify new data instances.
- Decision tree approaches and Naive Bayes classifiers. The Support Vector Machines model and use of Kernels to produce separable data and non-linear classification boundaries. The Weka toolkit.
- Graphs: Social Network Analysis, metrics, relational learning - Graph representations of data, with applications to social network data. Measurements of centrality and importance. Classification and prediction.
- Recommendations in social networks; neighbor and latent-based methods.
- Time-Series Analysis; dynamic time warping, dimensionality reduction, autoregressive moving averages
By the end of the module, the student should be able to:
- Understand the basic mathematical models for large data sets.
- Understand the principles and purposes of data analytics, and articulate the different dimensions of the area.
- Work with and manipulate a data set to extract statistics and features, coping with missing and dirty data.
- Apply basic data mining machine learning techniques to build a classifier or regression model, and predict values for new examples.
Professor Florin Ciucu, Associate Professor, Department of Computer Science at the University of Warwick
Anyone aged 18+ can study this course. Basic knowledge of linear algebra and calculus is beneficial.
To understand the fundamental skills in data analytics, including preparing and working with data; abstracting and modelling an analytic question; and using tools from statistics, learning and mining to address the question.
You must check with the relevant office of your institution if you will be awarded credit, but many institutions will allow this. In general, you’ll earn 3 credits in the US system, and 7.5 ECTS in the European system. Warwick will provide any necessary supporting evidence to help evaluate the worth of the course.
GBP 2070: Tuition fee (includes a 10% early booking discount, social programme and guest lecture series)
We offer enhanced discounts for Warwick alumni, Warwick study abroad partners and group bookings of 5+ students