Few Labels Learning

Learning with few annotated data

Few Labels Learning
© Fraunhofer IIS

The groundbreaking successes of artificial intelligence (AI) in tasks such as speech recognition, object recognition, and machine translation are due in part to the availability of enormously large annotated data sets. Annotated data, also called labeled data, contains the label information that makes up the meaning of individual data points and is essential for training machine learning models. In many real-world scenarios, especially in industrial environments, large amounts of data are often available, but they are not annotated or only poorly annotated. This lack of annotated training data is one of the major obstacles to the broad application of AI methods in the industrial environment. Therefore, in the competence pillar »Few Labels Learning«, learning with few annotated data is explored within three focus areas and different domains: meta-learning strategies, semi-supervised learning, and data synthesis.

Meta-learning strategies for pathology and autonomous systems

In the context of the implementation of »meta-learning strategies« among others in the »field of imaging medicine« methods like Few-Shot or Transfer-Learning are developed and researched.


Especially in the field of tissue classification in medicine, the annotation of large data sets is particularly difficult and the data situation often varies from hospital to hospital or from imaging device to imaging device. Therefore, methods of Few-Shot Learning, such as Prototypical Network, are suitable here to generalize between different applications. Transfer learning methods, whereby models are pre-trained on comparable data sets with many annotated data points, in order to then apply them to the actual problem, are also used in this context for the practical evaluation of medical CTs.

Overall, the application »Robust AI for Digital Pathology« uses methods of Few-Shot Learning for interactive tissue classification. Due to the possibility to interact with the models, possible new tissue classes can also be considered.

In the application »AI Framework for Autonomous Systems«, meta-learning strategies for autonomous driving are being researched. Reinforcement learning models are pre-trained in a simulation environment and then adapted for real-world application via transfer learning. In another part of this project, continuous learning is used to train models that can quickly and flexibly adapt to emerging scenarios in autonomous driving.

Semi-supervised learning in a time series context

Semi-supervised learning methods are applied in scenarios where a lot of training data is available, but only a minority of it is annotated. These methods also incorporate latent information from the non-annotated data, such as similarities, into the model training in order to train high-performance machine learning models.

In a subproject, methods from the field of »Consistency Regularization« are researched and developed for the application on sequential sensor data. Furthermore, semi-supervised learning strategies are used in a project in the area of camera-based automated garbage sorting to circumvent the problem of less annotated training data.  

Data synthesis in the field of data-driven localization

The annotation of training data is technically challenging and time-consuming in the area of localization. Becaue of that it is important to develop the right methods to support data mining.
In the application »Data-Driven Localization«, a measurement platform is developed for efficient data annotation using Active Learning and incorporating Predictive Uncertainty in order to suggest measurement points to the user that are expected to yield the greatest information gain.
On the other hand, the generation of the measurement or training data during data-driven localization also requires the inclusion of the statistical distribution or non-linearities of the signal propagation in the environment. Therefore, already during the data annotation phase, attention is paid to a non-uniform distribution of the class sets.
By methods like »SMOTE«, where either training data of the overrepresented class are discarded or additional data from the underrepresented class are augmented, such effects should be compensated.

»ADA wants to know« Podcast

In our »ADA wants to know« podcast series, the people responsible for the competence pillars are in discussion with ADA and provide insight into their research focuses, challenges, and methods. In this episode, listen to ADA with Few Labels Learning expert Jann Goschenhofer.

Our focus areas within AI research

Our work at the ADA Lovelace Center is aimed at developing the following methods and procedures in nine domains of artificial intelligence from an applied perspective.

Automatisches Lernen
© Fraunhofer IIS

Automated learning covers a vast field that ranges from automated feature recognition and selection for datasets, model search and optimization, or automated evaluation of these processes through to adaptive model adjustment using training data and system feedback. It plays a key role in areas such as assistance systems for data-driven decision support.

Sequenzbasiertes Lernen
© Fraunhofer IIS

Sequence-based learning concerns itself with the temporal and causal relationships found in data in applications such as language processing, event processing, biosequence analysis, or multimedia files. Observed events are used to determine the system’s current status, and to predict future conditions. This is possible both in cases where only the sequence in which the events occurred is known, and when they are labelled with exact time stamps.

Erfahrungsbasiertes Lernen
© Fraunhofer IIS

Experience-based learning refers to methods whereby a system is able to optimize itself by interacting with its environment and evaluating the feedback it receives, or dynamically adjusting to changing environmental conditions. Examples include automatic generation of models for evaluation and optimization of business processes, transport flows, or control systems for robots in industrial production.

Few Labels Learning
© Fraunhofer IIS

Major breakthroughs in AI involving tasks such as language recognition, object recognition or machine translation can be attributed in part to the availability of vast annotated datasets. Yet in many real-life scenarios, particularly in industry, such datasets are much more limited. We therefore conduct research on learning using small annotated datasets in the context of techniques for unsupervised, semi-supervised and transfer learning.

For several years, we have seen unbridled growth in the volume of digital data in existence, giving rise to the field of big data. When this data is used to generate knowledge, there is a need to explain the ensuing results and forecasts to users in a plausible and transparent manner. At the ADA Center, this issue is explored under the heading of explainable learning, with the goal of boosting acceptance for artificial intelligence among users in industry, research and society at large.

Mathematical optimization plays a crucial role in model-based decision support, providing planning solutions in areas as diverse as logistics, energy systems, mobility, finance, and building infrastructure, to name but a few examples. The Center is expanding its already extensive expertise in a number of promising areas, in particular real-time planning and control.

Semantik
© Fraunhofer IIS

The task of semantics is to describe data and data structures in a formally defined, standardized, consistent and unambiguous manner. For the purposes of Industry 4.0, numerous entities (such as sensors, products, machines, or transport systems) must be able to interpret the properties, capabilities or conditions of other entities in the value chain.

Few Data Learning
© Fraunhofer IIS

We use few data learning to address key research issues involved in processing and augmenting data, or generating sufficient datasets, for instance in AI applications using material master data in industry. This includes processing flawed datasets and using simulation techniques to generate missing data.

Other topics of interest

Optimized domain adaptation

This white paper is about optimizing machine learning (ML) models when using data from similar domains. A new two-step domain adaptation is introduced. 

Active Learning

Active learning favors labeling the most informative data samples. The performance of active learning heuristics, however, depends on both the structure of the underlying model architecture and the data. In this white paper, learn about a policy that reflects the best decisions from multiple expert heuristics given the current state of active learning and also learns to select samples in a complementary way that unifies expert strategies.

What the ADA Lovelace Center offers you

 

The ADA Lovelace Center for Analytics, Data and Applications offers - together with its cooperation partners - continuing education programs around concepts, methods and concrete applications in the topic area of data analytics and AI.

Seminars with the following focus topics are offered: