MEDFL5155 – Introduction to statistics and bioinformatics for the analysis of large-scale biological data
Course description
Course content
The course considers methods integral to data analysis in modern molecular medical research. As such it is relevant to students who need to analyze large-scale molecular data themselves, as well as those who need to interpret results and understand publications in the molecular life sciences.
High-throughput techniques are becoming increasingly more prevalent in research in life sciences and the clinic. However, to make effective use of the resulting large datasets it is necessary to understand and apply more advanced statistical methods as well as be able to apply good practices in programming and data analysis. We will describe guidelines for good practice such as the FAIR data principles and introduce the statistical concepts behind typical data analysis tasks for large-scale biological data, including the following topics:
a) high-throughput screening (multiple testing and group tests),
b) unsupervised learning and data visualization (clustering and heatmaps, dimension reduction methods),
c) supervised learning (classification and prediction, cross-validation and bootstrapping).
We will also introduce reference sources and molecular databases that can aid interpretation and will show how they can be accessed and integrated into a data analysis.