Seminar Applied Artificial Intelligence

Content

Selected topics from the field of socio-technical knowledge (see topics of the lecture Collaborative Intelligence). The seminar teaches the students how to write and present a scientific paper on a specific topic. Students are also introduced to doing a literature review of scientific papers. The final presentation is carried out in the form of a block event. More details will be given in the mandatory introductory meeting.

Requirements

This seminar is offered to both Bachelor and Master students. Registration via OLAT is required for this seminar; the access code will be given in the introductory meeting. As we provide each student with a topic and a tutor, the number of seminar places is limited to the number of topics available this semester (see below).

Materials

You can find all course materials, news and information in OLAT. The access code for the OLAT course will be given in the introductory meeting.

Organisation

The seminars of all students take place as a block event at the end of the semester.

There will be a mandatory introductory online meeting via BigBlueButton (BBB), where the form and topics of the seminar will be presented, and the OLAT access code will be published:

During the lecture period, students will work on the assigned seminar topics. Discussions with their supervisor will take place individually. We offer a mandatory two-hour course about scientific writing and working with LaTeX.

The paper is to be written in English and should be of length 10 pages (Bachelor: 8 pages) at the end. The presentations, which are also given in English, take place at the end of the semester and last about 25 minutes each (Bachelor: 20 minutes), including questions. Students should follow the provided obligatory templates for their seminar paper as well as for the presentation slides.

Topics

List of topics with a short description and the corresponding supervisor can be found below. Please do not contact the supervisors until after the topic assignment round.

  • Robustness in ML models to missing information (Francisco Mena)Machine learning (ML) models are techniques used to make predictions from input data, e.g. weather forecasting. However, the assumptions of these models usually imply that the input data to make predictions is correct (it does not have any errors) and complete (it does not have missing information). Nevertheless, when real-time data from different sources are used, these assumptions might be violated. Therefore, a challenging research question has been explored in the literature: how to make ML models robust to errors and missing information in the input data? This seminar will focus on exploring the literature around this question.
  • Feedback based Assistive Learning (Jayasankar Santhosh, pre-assigned)The main aim of the study is to investigate how feedback based interventions could assist learners in identifying their strengths and weaknesses, and also to encourage them to make improvements. Rather than following a one-size-fits-all approach, feedback can be tailored to each learner's individual needs and learning style thus making it personalized.
  • Unsupervised Representation Learning on Time Series (Philipp Engler (&FLaP))Time series data arises from sensors everywhere around us. Machine learning algorithms can help us make sense of this data, allowing them for instance to predict the action a human is performing or to detect faults in a machine. Machine learning models, such as neural networks, typically require large amounts of data for training. While labeled data can be expensive to acquire due to the need of human supervision or expensive measurements, unlabeled data is often available in larger quantities. Unsupervised representation learning methods, allowing to pre-train models on unlabeled data, have improved significantly and gained increased attention lately. We are interested in surveying recent literature on unsupervised representation learning in the time series domain to obtain an overview of the current state-of-the-art.
  • Satellite Image Super-Resolution (Brian Moser)Image Super-Resolution is the task of enhancing many low-resolution images to high-resolution. The goal of this survey is to discover the domain of aerial data for image super-resolution (SR). The task is to identify typical datasets, data structure (multi-spectral images), the standard training pipeline and to highlight the differences to classical single-image or multi-image SR. Next, various approaches and architectures should be explored to identify the capabilities of state-of-the-art methods with their strengths and weaknesses and to find interesting research avenues for future work, which can form an opportunity to work further on this topic.
  • Deep Learning Approaches for Requirement engineering (Summra Saleem)Following the success of artificial intelligence approaches in diverse types of application areas (energy, NLP and bioinformatics), software development industry is trying to utilize the power of deep learning methods for the development of more accurate and reliable software. Specifically, in software development, requirement engineering through deep learning based approaches is an active area of research. The prime objective of this topic is to highlight important areas of requirement engineering where deep learning has already been applied and to summarize trends of deep learning approaches that have been introduced in the last few years.
  • Artificial Intelligence for molecular compounds properties prediction (Muhammad Nabeel Asim)Molecular compounds are being extensively utilized for understanding biological processes and are key players in drug development processes. A comprehensive knowledge of compounds' properties assists in understanding impact of compounds on diverse types of drugs and diseases. A wide range of machine and deep learning approaches have been proposed for compound properties prediction. The aim of this review topic is to summarize existing machine and deep learning based approaches that have been proposed for compound properties predictions.
  • Machine learning approaches for Antibody sequence analysis (Muhammad Nabeel Asim)Antibodies are short protein sequences and are produced by human immune system. Overall, based on physical and chemical structure, biological processes antibody sequences are categorized into different classes. Automated classification of antibody sequences supports drug development process and helps in understanding biological processes. The development of machine learning approaches for antibody sequence analysis is an active area of research. The aim of this review topic is to summarize diverse types of machine and deep learning approaches that have been developed for antibody sequences analysis tasks.
  • Satellite radar data for ML models (Cristhian Sanchez, pre-assigned)Optical satellite images are conventionally used in ML models because of the potential information one can acquire through different electromagnetic spectrum, yet optical images are limitated by weather conditions. This limitation can be covered up by using or combining synthetic aperture radar imaging (SAR) data, e.g. Sentinel-1 products, with optical data. In the other hand, the complexity on radar data may be challenging. Therefore, the goal of this seminar is to provide an overview on the ML techniques used to handle SAR data and the fields of applications (e.g. agriculture, terrain deformations).
  • Discovering the usefullness of Animated Avatars (Ko Watanabe, pre-assigned)Animated avatars are widely used in the current era. Media creators use the avatar as a speaker to transmit knowledge to others. In the educational field generation, the animated avatar has a high potential to improve students' learning. In this seminar, we aim to further discover on how have researcher made approach and work on using avatar as a user interface.
  • Flood Prediction: Survey of ML-based rainfall-runoff modeling (Dinesh Krishna Natarajan)Climate change is causing significant changes to the Earth's climate patterns, leading to intense and frequent precipitation events increasing the risk of extreme weather events such as flash floods. Flash floods are especially dangerous because they occur with little to no warning, leaving limited time for safety precautions and emergency response. To facilitate faster and more extensive flood prediction models, large scale hydrology datasets and ML-based rainfall-runoff models have been developed in recent years. In rainfall-runoff modeling, data about rainfall events, topography of the area, soil type, and other factors that affect how water flows through the landscape are used as input data. The models will then predict how much water will flow into a river or stream during a given rainfall event, and how quickly it will do so. Such models will help predict the water runoff during extreme rainfall events in areas at risk of flash floods. In this seminar, the goal is to extensively study the existing open-source hydrology datasets and data-driven rainfall-runoff models for flood prediction.
  • Aspect-Level Sentiment Analysis & Perspective-Level Sentiment Analysis (Marc Gänsler)Sentiment analysis is the process of using natural language processing (NLP) techniques to identify and extract the emotional tone and polarity of text. It involves analyzing the language used in a piece of text to determine whether the overall sentiment expressed is positive, negative, or neutral. Aspect-level sentiment analysis (ALSA), also known as aspect-based sentiment analysis, is a more specific type of sentiment analysis that aims to identify the sentiment associated with individual aspects or features of a product, service, or news event. In other words, it seeks to determine not only whether a review or piece of text is positive or negative overall, but also which specific aspects of the product or service the sentiment is related to. Perspective-Level Sentiment Analysis (PLSA) is another variation, consider the following news-text: "Fabiano Caruana wins the chess match against Magnus Carlsen." The sentiment score for this text depends on whether it is viewed from Carlsen's or Caruana's perspective. The goal is to examine the current state of science on the topics of ALSA and PLSA and to discuss which ML technologies are currently playing a leading role here.
  • Test time domain adaptation (Jayanth Siddamsetty)The aim of this seminar is to introduce test time domain adapation and discuss the state of the art methods. Test time domain adaptation refers to the process of adapting a machine learning model to perform well on data from a new domain at the time of testing. The goal is to improve the model's generalization performance on the new domain without requiring a large amount of labeled data for training.
  • Generating multivariate and geospatial synthetic datasets (Julia Mayer)There are justifiably high hurdles by the Data Protection Law to use resident population data. However, these are often of great interest in the SmartCity area. In order to test methods and visualizations in a meaningful way, there is the idea to use artificially generated data sets, which have the same statistical characteristics as the real data sets. In this report different approaches to generate synthetic datasets and their advantages and disadvantages regarding different applications in a Smart Generating multivariate and geospatial synthetic datasetsCity shall be discussed.
  • Tree detection with aerial images in urban areas (Johannes Ruf)Urban tree detection using aerial imagery is crucial for city planning, environmental monitoring, and assessing green infrastructure. This task, however, is challenging due to factors such as diverse tree species, varying tree densities, shadows, and the presence of urban infrastructure. This seminar will explore the latest tree detection techniques in aerial images, focusing on machine learning and deep learning approaches.
  • Estimating Building Characteristics and City Structure Types using AI (Julia Mayer)Building databases are a fundamental component of urban analysis. However, such databases usually lack detailed attributes such as building age and type. In this report different ML- and DL-approaches using databases like aerial images, LIDAR data and street view images should be collected, described and evaluated.
  • Cross-Modal Retrieval in Remote Sensing (Francisco Mena)The search for information is a natural thing in human interest. However, when data has a complex structure, such as Remote Sensing (RS) images, it might be difficult to describe in natural words what you are looking for to a search engine. The use of an object from another modality (image, text, or audio) as a query could better describe the request and be more suitable to search for similar objects, e.g. audio to image. In RS, one might use an image with low quality and few spectral bands to search for another image with high quality and more detailed spectral bands. This seminar will focus on reviewing cross-modal retrieval (i.e. searching an object from one modality/domain to another) in the context of RS images.
  • Self Supervised Pretext Tasks in Earth Observation (Marco Stricker)Satellites are constantly observing the earth and are generating large amounts of data at high frequency. Due to this scale, it is only feasible to analyze the data in an automatic way using AI. However, the collected data is unlabeled and finding annoted datasets for the large amount of possible use cases is challenging and even if such datasets exist, the amount of samples is mostly limited. In order to alleviate this problem, several researchers start using self supervised learning, where a model is firstly trained on a pretext task in an unsupervised manner and then adapted to the intended task using a small labeled dataset. The goal of this seminar is to identify the most popular pretext tasks in the context of Earth Observation.
  • Attention mechanism for model explanation (Hiba Najjar)An attention mechanism allocates to each feature an attention weight, which is commonly interpreted as the importance of this feature to the model’s final prediction. While many studies build their analysis on this intuitive assumption, some researchers challenged this idea and proposed counter-examples and reliability tests to either further straighten or refute this assumption. In this seminar, the student will track down and summarize the different experiments proposed in the literature to decide whether one can use the attention weights learned by different attention mechanisms as feature importance scores.
  • Multi-task Learning and semantic bottlenecks in DL (Hiba Najjar)Multi-task learning (MTL) is a subfield of machine learning which often takes the form of training a single model on multiple related tasks. This enables the network to learn a shared representation for different objectives, and thus comes with many benefits, such as reducing overfitting, improving data efficiency and learning faster by leveraging auxiliary information. Semantic bottlenecks are a particular type of MLT; it uses the prediction results of an intermediate task made of human-understandable concepts to predict the target task. Semantic bottlenecks are commonly used as an explainability approach to leverage the transparency of the network. In this seminar, the student will review the different MLT approaches and investigate how they are or can be used to increase the explainability of a neural network.
  • Deep Learning in groundwater potential mapping. (Atharva Kavitkar)Groundwater potential refers to the possibility of finding and extracting groundwater in a particular location. This seminar will provide an extensive analysis of deep learning techniques used in groundwater potential mapping. It will evaluate the strengths and limitations of deep learning techniques and compare their performance to conventional groundwater mapping approaches. The survey will also highlight challenges related to the use of deep learning techniques, such as the requirement for large and diverse remote sensing datasets. This literature review aims to offer a comprehensive understanding of the current state-of-the-art of deep learning in groundwater potential mapping and identify potential directions for future research.
  • Indicators for Groundwater on Remote Sensing Images (Marcela Charfuelan)Remote sensing (RS) data most of the time provide information about the surface of the earth (surface reflectance) and in some cases information about some centimeters or meters below the earth (radar data). Groundwater is the water present beneath the Earths's surface, in some cases several meters deep, so what are the indicators of groundwater that we can observe in remote sensing data? In most cases RS does not provide direct information about groundwater but it has been succesfully used as an indirect way of mapping potential groundwater locations. This is due to the fact that the morphological, hydrological and geological characteristics of the surface govern the subsurface water conditions. In this seminar we will study recent techniques employed to map indicators of groundwater in RS data for particular locations.
  • Efficient learning with data augmentation curricula (Tobias Nauen)Data augmentation is a widely used technique in machine learning to enhance the performance of models by artificially generating additional training data from existing datasets. This approach has been shown to improve the accuracy and robustness of models trained on various tasks, including computer vision and natural language processing. Curriculum learning is a machine learning technique that involves presenting training data to a model in a gradually increasing order of difficulty, allowing the model to gradually learn more complex concepts and improve its performance. In this seminar, the student will discover how these two approaches are combined in recent works from different domains, such as natural language processing and deep reinforcement learning.
  • Continuous learning for biodiversity monitoring with remote sensing (Diego Arenas)Monitoring biodiversity is relevant with climate adaption and measuring the risk of extinction of species. Machine learning (ML) models can help to model and estimate the presence or absence of thousands of species and common approaches include MaxEnt, tree based models and deep learning. The presence data updates every day and retrainig thousands of predictive models might not be a feasible solution depending on the number of monitored species. Continual learning presents an alternative to new available data to improve the deployed ML model. The student will look into the available literature and provide a state of the art strategies and techniques used in continual learning or continual machine learning that are used or can be applied to monitor biodiversity globally using remote sensing datasets
  • AutoML for Remote Sensing (Diego Arenas)Automated machine learning AutoML presents nice qualities for automate the cleaning, preprocessing, experiment design, training and deployment processes of machine learning (ML). The student will review the recent literature about AutoML methods and techniques and prepare a summary with recommendations of usable methods to be used in the context of remote sensing and earth observation.
  • Deep Generative Models for Tabular Data (Dayananda Herurkar, pre-assigned)Deep Generative Models (DGM) such as GANs are really powerful. Currently, they are used in various tasks such as high-quality image and video generation, text-to-image translation, image enhancement, and reconstruction of 3D models of objects from images. However, most of its approaches are focused on the image domain. This seminar will explore and study how DGMs are used for tabular data generation.
  • Physics Informed Neural Networks for automotive tasks (Thorben Menne, Peter Schichtel)Detailed simulations are used to virtually test and evaluate sound quality in a cars interior in the development phase. However, acoustic simulations over many magnitudes of frequency and a multitude of boundary materials are a computationally expensive task and can take a long time. A promising alternative to classical FEM and ray tracing-based solvers are Physics Informed Neural Networks (PINNs), as they promise similar accuracy with improved inference speed in high dimension simulations. Your task in this seminar is to review the current state of PINN driven solutions for differential equations and their performance compared to classical solvers, with focus on automotive related tasks like acoustic simulation or battery simulation.
  • Multimodal AI for visual question answering on business and software diagrams (Max Märker, Peter Schichtel, pre-assigned)Since the recent breakthroughs in natural language modelling using Large Language Models (LLMs) like ChatGPT, the industry is pushing towards models that can digest multiple different sources of data simultaneously. In real world setting, data is often encoded in different formats. In the context of business analysis and software development most of the important information is embedded in human understandable documentation and visual representation such as diagrams and charts.One important task is answering questions based on these textual and visual inputs and extract information in a machine-readable format such as graph databases. New approaches like Salesforces BLIP, Googles Matcha or OpenAIs GPT4 use attention-based visual transformers in conjunction with LLMs to achieve multimodality.For the Seminar work you will research different model architectures and task objectives used for multimodal models. Compare the performance of these models based on publicly available benchmark data and outline how the field of multimodal AI may develop in the near future.
  • Deep Learning for Above Ground Biomass (AGB) Mapping (Jayanth Siddamsetty)Above ground biomass mapping refers to the process of measuring and quantifying the amount of living vegetation in a particular area, typically using remote sensing techniques. This information is critical for monitoring changes in ecosystems and assessing their overall health, as well as for guiding sustainable land-use practices and climate change mitigation efforts. The goal of this seminar is to introduce the topic and explain the state-of-the-art deep learning methods used for AGB mapping.
  • Quantum Computing & Quantum-Enhanced Machine Learning (Damian Hofmann, Peter Schichtel)Quantum computers are designed to exploit quantum physics to enable more efficient algorithms for important classes of computational problems. In particular, potential applications of quantum computers to machine learning problems have been proposed. The goal of this seminar topic is to provide a high-level introduction to quantum computing and discuss an example machine learning application (of the student's choice) with its potential benefits and limitations.

Topics marked as pre-assigned have been already assigned to students who have previously worked with their DFKI supervisors.

Contact