Thesis Topics
This list includes topics for potential bachelor or master theses, guided research, projects, seminars, and other activities. Search with Ctrl+F for desired keywords, e.g. ‘machine learning’ or others.
PLEASE NOTE: If you are interested in any of these topics, click the respective supervisor link to send a message with a simple CV, grade sheet, and topic ideas (if any). We will answer shortly.
Of course, your own ideas are always welcome!
Data Augmentation for Small Datasets with Synthetic Elements
Type of Work:
- Bachelor
Keywords:
- data augmentation
- small dataset
- synthetic data
- vision transformer
Description:
Traditional data augmentation (flipping, cropping, color jitter) is essential for training vision models, but it is limited to the existing compositions within the original dataset. This project is an extention of out paper ForAug (Foreground Augmentation) [1], which addresses a weakness of Vision Transformers (ViT): their lack of inherent translation equivariance. ForAug works by separating foreground objects from their backgrounds and recombining them on-the-fly with different backgrounds, sizes, and positions for each training epoch. This process directly encodes spatial invariances into the training data, leading to superior performance and reduced model biases (e.g., background robustness, center bias) on large datasets like ImageNet (resulting in the ForNet dataset).
The primary goal of this project is to implement and evaluate the ForAug data augmentation technique on small (domain-specific) datasets. By creating an equivalent “ForNet” version for a new task, we will investigate how this method—which recombines foreground objects with diverse backgrounds—impacts model performance compared to traditional augmentation methods on limited data. We will also extend the original ForAug mechanism by incorporating semi-synthetic samples, not only sourced from other datasets but also programmatically generated (e.g., via generative models) to introduce more diverse training samples.
- [1] Nauen et al. “ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation”, arXiv, 2025
Controllable Outlier Generation using Diffusion(Residual) for Tabular Anomaly Detection
Type of Work:
- Master
Keywords:
- anomaly detection
- diffusion models
- synthetic outliers
- tabular data
Description:
This thesis focuses on developing a controllable diffusion-based framework for synthetic outlier generation in tabular data. The method builds upon two complementary ideas: (1) static corruption strategies for labeled outlier creation, and (2) diffusion modeling over residual noise patterns. Instead of directly generating full outlier samples, the proposed method trains a conditional diffusion model to learn corruption distributions that represent how clean inlier samples are transformed into outliers. During inference, the diffusion model generates conditioned on both the outlier subtype and the cell-level corruption mask, allowing precise control over where and how anomalies appear. The final outlier is formed by applying the learned residual to a clean inlier sample. This two-step residual diffusion approach mirrors traditional corruption-based anomaly simulation but replaces hand-crafted noise with a learned, data-driven corruption distribution. This research aims to achieve realistic and diverse synthetic anomalies that improve the robustness of downstream outlier detection models by combining interpretable corruption methods and modern generative diffusion modeling.
Synthetic Anomalies
- Fin-Fed-OD: Federated Outlier Detection on Financial Tabular Data
- Explaining Anomalies using Denoising Autoencoders for Financial Tabular Data
Tabular Data Generation Using Diffusion Models
- [FinDiff: Diffusion Models for Financial Tabular Data Generation] (https://arxiv.org/abs/2309.01472)
NLP based Protein Design
Type of Work:
- Guided Research
- Master
Keywords:
- bioinformatics
- CLIP models
- multi-modal learning
- natural language processing
- protein design
- protein engineering
- text-guided design
Description:
This project explores using natural language descriptions to guide protein design and engineering. Instead of only using protein sequences and structures, this approach incorporates human knowledge about protein functions written in text format, such as “binds to DNA” or “catalyzes glucose breakdown.”
The thesis will investigate multi-modal frameworks that can understand both textual descriptions of desired protein functions and protein sequence data. Students will work on developing models that can take text descriptions like “design an enzyme that breaks down plastic” and generate corresponding protein sequences with those properties. This approach bridges the gap between high-level functional descriptions that humans understand and the complex molecular details needed for actual protein design.
The goal is to make protein design more accessible by allowing researchers to describe what they want a protein to do in plain English, rather than requiring deep expertise in protein structure and biochemistry. This could accelerate the development of new enzymes for biotechnology, medicine, and environmental applications.
References:
- A Text-guided Protein Design Framework
- Natural Language Prompts Guide the Design of Novel Functional Protein Sequences
Object-Background combination. A novel data-augmentation for Object Detection
Type of Work:
- Master
Keywords:
- data augmentation
- object detection
- synthetic data
- vision transformer
Description:
Object detection, the task of identifying and localizing instances of objects within an image by predicting bounding boxes and class labels, is a cornerstone of many computer vision applications such as surveillance, autonomous navigation, visual search, and robotics. The performance of deep learning models for object detection is critically dependent on the availability of large, diverse datasets with accurate bounding box annotations. Manually creating such datasets is a laborious and costly endeavor.
Data augmentation techniques are indispensable for synthetically enlarging training datasets, thereby improving model robustness and generalization. This thesis will investigate a novel, cutting-edge data augmentation method that segments objects from existing images and intelligently recombines them with various background images. A significant advantage of this approach is the automatic generation of ground-truth bounding boxes for the newly composed scenes, directly addressing a major bottleneck in object detection dataset creation.
This thesis aims to thoroughly explore and leverage this novel object-based recombination data augmentation technique specifically for the task of object detection. We hypothesize that this method can substantially enhance the performance, robustness (e.g., to scale, occlusion, and varied contexts), and domain adaptation capabilities of object detection models.
Plant Genetics based AI-driven Breeding Tool
Type of Work:
- Master
Keywords:
- agriculture AI
- crop optimization
- genetics
- genomics
- machine learning
- phenotype prediction
- plant breeding
Description:
This project aims to develop an AI tool that helps plant breeders create better crops faster and more efficiently. Traditional plant breeding takes many years of trial and error to develop crops with desired traits like higher yield, disease resistance, or drought tolerance.
The thesis will focus on building machine learning models that can predict which plant crosses will produce the best offspring based on genetic data. Students will work with plant genomic datasets to train AI models that can suggest optimal breeding strategies. The tool will analyze genetic markers and predict traits like crop yield, nutritional content, or environmental adaptability before plants are actually grown.
This approach can significantly reduce the time needed to develop new crop varieties from decades to just a few years. The AI system will help farmers and researchers make data-driven decisions about which plants to breed together, leading to more sustainable agriculture and better food security.
References:
- Genomic selection in plant breeding: methods, models, and perspectives
- Machine learning for plant breeding and biotechnology
RNA Sequence Design using Deep Learning
Type of Work:
- Master
Keywords:
- bioinformatics
- deep learning
- neural networks
- RNA design
- sequence optimization
Description:
The goal of this project is to use deep learning methods to design RNA sequences with specific desired functions. Traditional RNA design relies on complex rules and manual optimization, which can be slow and limited. This thesis will explore how neural networks can learn patterns from existing RNA data to automatically generate new sequences that fold into target structures or perform specific biological tasks.
The project will focus on training deep learning models on RNA sequence-structure datasets and developing methods to generate functional RNA molecules. The approach will combine sequence generation techniques with structure prediction to ensure the designed RNAs can actually fold correctly and work as intended.
References:
- RNA design rules from a massive open laboratory
- Improved RNA secondary structure prediction by maximizing expected accuracy
Efficient Optimization with Multi-Level Gradient Accumulation
Type of Work:
- Master
Keywords:
- machine learning
- optimization
Description:
Multi-level methods are widely used in numerical analysis to solve problems efficiently by combining solutions across coarse and fine resolutions (levels). This project explores how a similar idea can be applied to gradient-based optimization in deep learning: gradients are first computed on coarse levels (e.g. low resolution or small size) using a large batch size, then refined using residual gradients from finer levels. The goal is to improve the quality of gradient estimates while reducing the computational cost of high-resolution training. The student will implement this approach in Jax and test it on models for classification or generative tasks. Background in deep learning and interest in optimization techniques is important; familiarity with Python, Jax/PyTorch and NumPy is a plus but not strictly required.