Thesis Topics

This list includes topics for potential bachelor or master theses, guided research, projects, seminars, and other activities. Search with Ctrl+F for desired keywords, e.g. ‘machine learning’ or others.

PLEASE NOTE: If you are interested in any of these topics, click the respective supervisor link to send a message with a simple CV, grade sheet, and topic ideas (if any). We will answer shortly.

Of course, your own ideas are always welcome!

Neural ODEs for Adaptive GAN Training

Type of Work:

Bachelor
Master

Keywords:

GANs
Neural Ordinary Differential Equations

Description:

The goal of this work is to integrate Neural Ordinary Differential Equations (Neural ODEs) into the training of Generative Adversarial Networks (GANs). While GANs are powerful and effective, they are notoriously difficult to train due to instability and mode collapse, stemming from the adversarial nature of the training framework. At the same time, Neural ODEs have demonstrated parameter efficiency by modeling data transformations as a continuous process. This project aims to leverage this property to enable GANs to dynamically adjust the required function evaluations during training, allowing the model to adapt as the generator improves.

[1] Generative Adversarial Networks, https://arxiv.org/abs/1406.2661
[2] Neural Ordinary Differential Equations, https://arxiv.org/abs/1806.07366
[3] Training Generative Adversarial Networks by Solving Ordinary Differential Equations, https://arxiv.org/abs/2010.15040

Data Augmentation for Small Datasets with Synthetic Elements

Type of Work:

Bachelor

Keywords:

data augmentation
small dataset
synthetic data
vision transformer

Description:

Traditional data augmentation (flipping, cropping, color jitter) is essential for training vision models, but it is limited to the existing compositions within the original dataset. This project is an extention of out paper ForAug (Foreground Augmentation) [1], which addresses a weakness of Vision Transformers (ViT): their lack of inherent translation equivariance. ForAug works by separating foreground objects from their backgrounds and recombining them on-the-fly with different backgrounds, sizes, and positions for each training epoch. This process directly encodes spatial invariances into the training data, leading to superior performance and reduced model biases (e.g., background robustness, center bias) on large datasets like ImageNet (resulting in the ForNet dataset).

The primary goal of this project is to implement and evaluate the ForAug data augmentation technique on small (domain-specific) datasets. By creating an equivalent “ForNet” version for a new task, we will investigate how this method—which recombines foreground objects with diverse backgrounds—impacts model performance compared to traditional augmentation methods on limited data. We will also extend the original ForAug mechanism by incorporating semi-synthetic samples, not only sourced from other datasets but also programmatically generated (e.g., via generative models) to introduce more diverse training samples.

[1] Nauen et al. “ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation”, arXiv, 2025

Controllable Outlier Generation using Diffusion(Residual) for Tabular Anomaly Detection

Type of Work:

Master

Keywords:

anomaly detection
diffusion models
synthetic outliers
tabular data

Description:

This thesis focuses on developing a controllable diffusion-based framework for synthetic outlier generation in tabular data. The method builds upon two complementary ideas: (1) static corruption strategies for labeled outlier creation, and (2) diffusion modeling over residual noise patterns. Instead of directly generating full outlier samples, the proposed method trains a conditional diffusion model to learn corruption distributions that represent how clean inlier samples are transformed into outliers. During inference, the diffusion model generates conditioned on both the outlier subtype and the cell-level corruption mask, allowing precise control over where and how anomalies appear. The final outlier is formed by applying the learned residual to a clean inlier sample. This two-step residual diffusion approach mirrors traditional corruption-based anomaly simulation but replaces hand-crafted noise with a learned, data-driven corruption distribution. This research aims to achieve realistic and diverse synthetic anomalies that improve the robustness of downstream outlier detection models by combining interpretable corruption methods and modern generative diffusion modeling.

Synthetic Anomalies

Tabular Data Generation Using Diffusion Models

[FinDiff: Diffusion Models for Financial Tabular Data Generation] (https://arxiv.org/abs/2309.01472)

NLP based Protein Design

Type of Work:

Guided Research
Master

Keywords:

bioinformatics
CLIP models
multi-modal learning
natural language processing
protein design
protein engineering
text-guided design

Description:

This project explores using natural language descriptions to guide protein design and engineering. Instead of only using protein sequences and structures, this approach incorporates human knowledge about protein functions written in text format, such as “binds to DNA” or “catalyzes glucose breakdown.”

The thesis will investigate multi-modal frameworks that can understand both textual descriptions of desired protein functions and protein sequence data. Students will work on developing models that can take text descriptions like “design an enzyme that breaks down plastic” and generate corresponding protein sequences with those properties. This approach bridges the gap between high-level functional descriptions that humans understand and the complex molecular details needed for actual protein design.

The goal is to make protein design more accessible by allowing researchers to describe what they want a protein to do in plain English, rather than requiring deep expertise in protein structure and biochemistry. This could accelerate the development of new enzymes for biotechnology, medicine, and environmental applications.

References:

A Text-guided Protein Design Framework
Natural Language Prompts Guide the Design of Novel Functional Protein Sequences

Object-Background combination. A novel data-augmentation for Object Detection

Type of Work:

Master

Keywords:

data augmentation
object detection
synthetic data
vision transformer

Description:

Object detection, the task of identifying and localizing instances of objects within an image by predicting bounding boxes and class labels, is a cornerstone of many computer vision applications such as surveillance, autonomous navigation, visual search, and robotics. The performance of deep learning models for object detection is critically dependent on the availability of large, diverse datasets with accurate bounding box annotations. Manually creating such datasets is a laborious and costly endeavor.

Data augmentation techniques are indispensable for synthetically enlarging training datasets, thereby improving model robustness and generalization. This thesis will investigate a novel, cutting-edge data augmentation method that segments objects from existing images and intelligently recombines them with various background images. A significant advantage of this approach is the automatic generation of ground-truth bounding boxes for the newly composed scenes, directly addressing a major bottleneck in object detection dataset creation.

This thesis aims to thoroughly explore and leverage this novel object-based recombination data augmentation technique specifically for the task of object detection. We hypothesize that this method can substantially enhance the performance, robustness (e.g., to scale, occlusion, and varied contexts), and domain adaptation capabilities of object detection models.

Plant Genetics based AI-driven Breeding Tool

Type of Work:

Master

Keywords:

agriculture AI
crop optimization
genetics
genomics
machine learning
phenotype prediction
plant breeding

Description:

This project aims to develop an AI tool that helps plant breeders create better crops faster and more efficiently. Traditional plant breeding takes many years of trial and error to develop crops with desired traits like higher yield, disease resistance, or drought tolerance.

The thesis will focus on building machine learning models that can predict which plant crosses will produce the best offspring based on genetic data. Students will work with plant genomic datasets to train AI models that can suggest optimal breeding strategies. The tool will analyze genetic markers and predict traits like crop yield, nutritional content, or environmental adaptability before plants are actually grown.

This approach can significantly reduce the time needed to develop new crop varieties from decades to just a few years. The AI system will help farmers and researchers make data-driven decisions about which plants to breed together, leading to more sustainable agriculture and better food security.

References:

Genomic selection in plant breeding: methods, models, and perspectives
Machine learning for plant breeding and biotechnology

RNA Sequence Design using Deep Learning

Type of Work:

Master

Keywords:

bioinformatics
deep learning
neural networks
RNA design
sequence optimization

Description:

The goal of this project is to use deep learning methods to design RNA sequences with specific desired functions. Traditional RNA design relies on complex rules and manual optimization, which can be slow and limited. This thesis will explore how neural networks can learn patterns from existing RNA data to automatically generate new sequences that fold into target structures or perform specific biological tasks.

The project will focus on training deep learning models on RNA sequence-structure datasets and developing methods to generate functional RNA molecules. The approach will combine sequence generation techniques with structure prediction to ensure the designed RNAs can actually fold correctly and work as intended.

References:

RNA design rules from a massive open laboratory
Improved RNA secondary structure prediction by maximizing expected accuracy

Efficient Optimization with Multi-Level Gradient Accumulation

Type of Work:

Master

Keywords:

machine learning
optimization

Description:

Multi-level methods are widely used in numerical analysis to solve problems efficiently by combining solutions across coarse and fine resolutions (levels). This project explores how a similar idea can be applied to gradient-based optimization in deep learning: gradients are first computed on coarse levels (e.g. low resolution or small size) using a large batch size, then refined using residual gradients from finer levels. The goal is to improve the quality of gradient estimates while reducing the computational cost of high-resolution training. The student will implement this approach in Jax and test it on models for classification or generative tasks. Background in deep learning and interest in optimization techniques is important; familiarity with Python, Jax/PyTorch and NumPy is a plus but not strictly required.

Thesis Topics

Neural ODEs for Adaptive GAN Training

Type of Work:

Keywords:

Description:

Data Augmentation for Small Datasets with Synthetic Elements

Type of Work:

Keywords:

Description:

Controllable Outlier Generation using Diffusion(Residual) for Tabular Anomaly Detection

Type of Work:

Keywords:

Description:

NLP based Protein Design

Type of Work:

Keywords:

Description:

Object-Background combination. A novel data-augmentation for Object Detection

Type of Work:

Keywords:

Description:

Plant Genetics based AI-driven Breeding Tool

Type of Work:

Keywords:

Description:

RNA Sequence Design using Deep Learning

Type of Work:

Keywords:

Description:

Efficient Optimization with Multi-Level Gradient Accumulation

Type of Work:

Keywords:

Description:

On This Page