STK-MAT2011 - Project Work in Finance, Insurance, Risk and Data Analysis

Projects

Below is a preliminary list of suggested projects, but you may also contact other possible supervisors. Send an e-mail to?Gudmund Hermansen?about your decision.

The project paper should be about 15 pages long and must include the official front page.?

Project 1: Sparse Network Estimation with Gaussian Graphical Models (Camilla Lingj?rde - camiling@math.uio.no). Network models are widely used to summarise dependence structures in multivariate data and to identify conditional relationships between variables. This project studies the estimation of undirected networks under Gaussian graphical models, where edges represent conditional dependence. The student will apply established methodology and compare frequentist methods (e.g., the graphical lasso and neighborhood selection with tuning/model selection) to Bayesian methods (e.g., the graphical horseshoe with posterior inference) for sparse network inference. To compare methods, the student will conduct a simulation study in R, varying sample size and dimension, with emphasis on edge-recovery performance.

Project 2: Statistical Methods for Imputation and Synthetic Data (Gudmund Hermansen - gudmunhh@math.uio.no).?This project addresses the challenge of missing data, which is a common issue in real-world datasets and can lead to biased or unreliable analyses if not handled properly. The project investigates statistical methods for handling missing data, with an emphasis on imputation techniques and the generation of synthetic data. Both classical statistical approaches and machine learning--based methods are considered, and their impact with respect to data quality, uncertainty, and downstream analyses.

Project 3: Knowledge Distillation for Tree-Based Models (Gudmund Hermansen - gudmunhh@math.uio.no).?Knowledge distillation is a machine learning technique in which a complex model (the teacher), trained on a large (and potentially private) dataset, transfers information to a simpler model (the student) using model-generated predictions rather than observed labels. The teacher model may also belong to a different model class, such as a deep learning model, while the student is a decision tree. Knowledge distillation is an established method within neural networks, typically transferring knowledge from a complex network to one with a simpler structure; however, its effectiveness for tree-based models is less well understood. In this project, a teacher model is trained on an initial dataset and used to generate synthetic response values for a second dataset. These synthetic labels are used alone or combined with observed responses, and a new decision tree model is trained on the augmented data. The student tree may differ from the teacher in depth, structure, and/or training data, allowing an investigation of whether distillation can improve predictive accuracy, stability, or interpretability for tree-based models.

Project 4: Regularisation via Pseudo-Observations and Data Augmentation (Gudmund Hermansen - gudmunhh@math.uio.no). Regularisation is a fundamental concept in statistics and machine learning, commonly implemented through penalties, shrinkage, or prior assumptions. An alternative viewpoint is that regularisation can be interpreted as adding pseudo-observations to the data, sometimes referred to as data augmentation. In this project, we explore adding pseudo-observations based on expert knowledge and/or synthetic observations generated from alternative models. These ��observations�� are treated as additional data points and combined with the observed data to train a new model. The focus is on understanding how such pseudo-observations act as an implicit form of regularisation, influencing bias, variance, and model complexity, as well as how this approach relates to classical regularisation techniques and Bayesian interpretations.

Project 5: Machine learning and high frequency financial time series (Gudmund Hermansen - gudmunhh@math.uio.no).? In this project you will compare more traditional statistical models developed for high frequency financial time series with competing methods from machine learning. You will work with several examples of high frequency tick data from foreign exchange, and explore possibilities and limitations of both approaches.?

Project 6: Applied data analysis and statistical modelling for a kaggle-like competition or dataset. (Gudmund Hermansen - gudmunhh@math.uio.no).?Within this project applied data analysis and predictive modelling will be carried out. A student is allowed to choose a competition or a data-set of interest for him/her on one of the popular data science platforms: kaggle, topcoder or?UC Irvine Machine Learning Repository. Then preliminary data analysis should be performed, followed by careful statistical modelling, inference and eventually evaluation of predictions and explaining the results.

Project 8: Four projects for STK-MAT2011 (Emil Aas Stoltenberg - emilas@math.uio.no). Projects descriptions.

Project 9: Anyone for chess? (Nils Lid Hjort - nils@math.uio.no and Gudmund Hermansen gudmunhh@math.uio.no). Project description.?

Project 10: Generative models for imbalanced data (Ingrid Hob?k Haff - ingrihaf@math.uio.no). Projects descriptions.

Project 11: Finance, Insurance and Risk Analysis (Frank Proske - proske@math.uio.no). Students that are interested in a project within finance, insurance and risk analysis?should contact Frank Proske (proske@math.uio.no) or Gudmund Hermansen (gudmunhh@math.uio.no) for more information.

Project 12: Predicting monthly number of battle deaths (Gudmund Hermansen - gudmunhh@math.uio.no). In this project you will explore various statistical and machine learning models for modelling the monthly number of battle deaths in a given country. We will work with conflict data from https://viewsforecasting.org/ which is based on https://ucdp.uu.se/ (a comprehensive database on conflict data) and investigate battle deaths time series for several countries. The main focus of the project is to make a prediction model for the number of battle deaths in the following month, and the techniques for evaluating and comparing performance of competing prediction models. Note that no prior knowledge of time series analysis is required, and a significant part of the project will be about practical data analysis and exploration.

Published Jan. 8, 2026 3:59 PM - Last modified Jan. 21, 2026 9:24 AM