MF9155 – Introduction to statistics and bioinformatics for the analysis of large-scale biological data
Course description
Schedule, syllabus and examination date
Course content
The course considers methods integral to data analysis in modern molecular medical research. As such it is relevant to all PhD students and researchers who need to analyze large-scale molecular data themselves, as well as those who need to interpret results and understand publications in the molecular life sciences.
High-throughput techniques are becoming increasingly more prevalent in research in life sciences and the clinic. However, to make effective use of the resulting large datasets it is necessary to understand and apply more advanced statistical methods as well as be able to apply good practices in programming and data analysis. We will describe guidelines for good practice such as the FAIR data principles and introduce the statistical concepts behind typical data analysis tasks for large-scale biological data, including the following topics:
a) high-throughput screening (multiple testing and group tests),
b) unsupervised learning and data visualization (clustering and heatmaps, dimension reduction methods),
c) supervised learning (classification and prediction, cross-validation and bootstrapping).
We will also introduce reference sources and molecular databases that can aid interpretation and will show how they can be accessed and integrated into a data analysis.
Methods will be demonstrated by replicating analyses from publications and real-life gene expression data will be used in the computer labs.
To encourage continued learning after the course, we will also provide an overview of available web-based courses and exercises.
Learning outcome
Knowledge:
- Learn important statistical and bioinformatics concepts for analysing molecular data, including good practices in programming and data analysis.
- Have knowledge of the specific statistical challenges associated with the analysis of high-throughput biological data.
- Know important molecular databases and relevant statistics/ bioinformatics software tools.
- Understand some of the challenges you will face when trying to apply this knowledge to the analysis of real datasets.
Skills:?
- Be able to identify the data analysis problem and match the appropriate type of statistical method and corresponding software.
- Perform basic analyses of high-throughput biological data using R and Bioconductor.
- Be able to understand and critically evaluate the data analysis procedures in publications in molecular biology/ molecular medicine.
Admission to the course
Maximum number of participants is 30-35.?PhD candidates at UiO will be prioritized.
Applicants admitted to a PhD programme at UiO apply to this course in StudentWeb.
Applicants who are not admitted to a PhD programme at UiO must apply for a right to study before they can apply to this course. See information here: ?How to apply for a right to study and admission to elective PhD courses in medicine and health sciences.
Applicants will receive a reply to the course application in?StudentWeb?at the latest one week after the application deadline.
Recommended previous knowledge
Students should have passed the exam in an introductory course in statistics (for example?MF9130, MF9130E).
Students should also have working knowledge and practical experience in analysing data with?the statistical programming language R. Basic familiarity with the?Unix shell is also required, for example by having completed?a software carpentry workshop.
It is recommended that students have a basic understanding of molecular biology, at least roughly corresponding to 5-10 university study points in molecular biology or similar. Students would have completed an introductory course in R could for example complete an?introductory online course?or follow a?software carpentry course at UiO.
Overlapping courses
- 5 credits overlap with MEDFL5155 – Introduction to statistics and bioinformatics for the analysis of large-scale biological data.
- 2 credits overlap with MF9395.
- 2 credits overlap with MEDFL5395.
- 2 credits overlap with MF9385.
- 2 credits overlap with MEDFL5385.
Teaching
The teaching will be organized as an intensive course over six days.
There will be lectures coupled with hands-on practicals and example data analyses in the computer labs as well as group project work.
Students will need to allow for sufficient time in advance for course preparations, which include some required reading, as well as after the course for the home exam.
You have to participate in at least 80 % of the teaching to be allowed to take the exam. Attendance will be registered.
Examination
Home exam in the form of a comprehensive data analysis task based on a recent publication. The exam is to be submitted four weeks after completion of the course.
Language of examination
The examination text is given in English, and you submit your response in English.
Grading scale
Grades are awarded on a pass/fail scale. Read more about the grading system.
More about examinations at UiO
- Use of sources and citations
- Special exam arrangements due to individual needs
- Withdrawal from an exam
- Illness at exams / postponed exams
- Explanation of grades and appeals
- Resitting an exam
- Cheating/attempted cheating
You will find further guides and resources at the web page on examinations at UiO.