19 August 2022
Applied Compositional Data Analysis: Multivariate and Functional Approaches
Compositional data can be characterized as multivariate observations carrying relative information, typically expressed in units like proportions, percentages, mg/kg, ppm or mg/l, and they occur in a wide range of applications from natural and social sciences. Compositions are thus primarily data where the relevant information is contained in (log-)ratios between components; this led to development of the logratio methodology which is nowadays commonly considered as the preferred choice for their statistical processing. First aim of the course will be to introduce basic concepts of compositional data analysis including their geometrical properties (compositions are characterized by the so-called Aitchison geometry) and interpretable logratio coordinate representations which enable to use popular multivariate methods for statistical analysis of compositional data sets. Secondly, an important case of compositions are distributional data, resulting usually from aggregation of large streams of data, which can be expressed in terms probability mass function of one or more random variables (factors). The latter case leads for two factors to the so called compositional tables, or in general, to multifactorial compositional data. They can be decomposed orthogonally into independent and interactive parts and for each of them an interpretable coordinate representation is built. Finally, also the functional counterpart to compositional data (distributional data expressed in form of probability density functions) has recently gained increasing attention in the applications. The course will provide an introduction to the analysis of these data using a Functional Data Analysis (FDA) approach, grounded on the perspective of Bayes spaces. These spaces are mathematical spaces whose points are densities, which generalize to the FDA setting the Aitchison geometry for multivariate compositional data. The theoretical parts of the course will be accompanied by examples with real-world data using statistical software R.
Karel Hron (Palacký University)
Course coordinator: Klaus Nordhausen (JYU)
Although the course will be taught on applied level, the minimum are one semester undergraduate courses in statistics and mathematics. Familiarity with multivariate data analysis and statistical software R will be beneficial; for those who are not working with R instructions for using an alternative ("clickable") software will be provided. Functional data analysis of densities will be taught from scratch, here no prerequisities are needed.
Learning outcomes: Students familiarize with the logratio methodology of compositional data analysis so that at the end of the course they will be able to use it for statistical processing of their own data. For this purpose lecture notes and R scripts from the course will be provided.
EUR 0: Participation in the Summer School courses is free of charge, but students are responsible for covering their own meals, accommodation and travel costs as well as possible visa costs.
Jyväskylä Summer School is not able to grant Summer School students financial support.