Sequence-based Learning

From Handling noisy data in multivariate learning to multilevel time series forecasting

Sequenzbasiertes Lernen
© Fraunhofer IIS

Research on time series analysis has gained momentum in recent years, as insights from time series analysis can improve the decision-making process for industrial and scientific domains. Time series analysis aims to describe patterns and trends that occur in data over time. Among the many useful applications of time series analysis, classification, regression, forecasting, and anomaly detection of time points and events in sequences (time series) are particularly noteworthy as they contribute important information to, for example, business decision making. In today's information-driven world, countless numerical time series are generated by industry and research on any given day. Many applications - including biology, medicine, finance, and industry - require high-dimensional time series. Dealing with such large datasets brings up several new and interesting challenges.

Challenges in natural processes

Despite significant developments in multivariate analysis modeling, problems still occur when dealing with high-dimensional data because not all variables directly affect the target variable. As a result, predictions become inaccurate when unrelated variables are considered. This is often the case in practical applications such as signal processing. Natural processes, as we find in the applications mentioned below, process data described by a multivariate stochastic process to account for relationships that exist between individual time series.

Regression: »Efficient search and representation of tracking data«

Privacy warning

With the click on the play button an external video from www.youtube.com is loaded and started. Your data is possible transferred and stored to third party. Do not start the video if you disagree. Find more about the youtube privacy statement under the following link: https://policies.google.com/privacy

In the »Efficient Search and Representation of Tracking Data« application, space-time data such as video tracking data is processed.
For example, AI accelerates the search for sequences of coordinates (game scenes) representing ball and player trajectories. This includes using Siamese networks and auto-encoders to learn distance-preserving projection into an embedding space and then search within it. Sequence-based AI methods are also used for feature extraction in space-time trajectories to evaluate game scenes. In addition, prediction of defenders' movements (defensive behavior) is performed with Long-Short Term Memory cells (LSTMs) using multi-agent imitation learning.
Reinforcement learning is also used to generate new, creative game sequences from action sequences (actions) and trajectories.

Forecast: »Data-driven positioning«

Privacy warning

With the click on the play button an external video from www.youtube.com is loaded and started. Your data is possible transferred and stored to third party. Do not start the video if you disagree. Find more about the youtube privacy statement under the following link: https://policies.google.com/privacy

The »Data-driven positioning« application is about time series data collected using synchronized antennas. System synchronization is error-prone and collected signal streams are subject to multipath dispersion, fading effects, temperature, motion dynamics in the system long-term stochastic noise. Temporal correlations in the data can help to remove quasi-static noise (called denoising) to expose informative features. Furthermore, data-driven motion models used to analyze motion over time increase the accuracy of position prediction (so-called forecasting), which otherwise suffers from the highly simplified description of conventional model-driven filters.

Forecast: »Comprehensible AI for multimodal state recognition«

The application »Comprehensible AI for multimodal state recognition« processes data from a complex signal processing chain and uses the time sequence to identify correlations of different information sources and predict higher-level actions, e.g. »Four pedestrians are walking on the sidewalk three meters away and will reach the crosswalk in 57 seconds«.

Anomaliedetektion: »Intelligente Leistungselektronik« und »KI-gestützte Zustands- und Störungsdiagnose Funksysteme«

In the applications »Intelligent Power Electronics« and »Monitoring and fault diagnosis of industrial wireless systems«, time series data of different signal processing chains are also processed. In both cases, monitoring the data streams over time is necessary to distinguish desired from anomalous changes. The time history makes it possible to identify and localize sources of interference that cannot be identified from the snapshot perspective of the data.

Together with the applications, various research areas have been identified

Application-specific model optimization

Core areas of interest are appropriate data collection (»What information needs to be collected?«), data analysis (»What level of abstraction is optimal for the problem and process at hand?«), data preprocessing (»How must the data be normalized and standardized to make significant predictions?«), and deriving the optimal features and architecture for a defined and if possible atomic problem to deal with stochastic processes. In an initial analysis phase, application-specific optimal classification and regression methods are derived for target categories and variables but also for the identification, detection and prediction of anomalies. The effects of the fusion of temporal, spatial, spectral and mixed information extractors on the quality of the results are also investigated. Another application-specific focus is the investigation of the effects of the temporal architecture of neural networks such as context vectors in long-term short-term memory cell LSTM as well as attention and traceability of long-term, short-term and future dependencies in continuous information.

Uncertainty minimization of prediction methods

Another focus is to reduce the uncertainty of prediction methods »How can the error variance and bias of the prediction be reduced with increasing complexity and dimension of the data?« Therefore, the effects of Monte Carlo dropout methods on model accuracy and uncertainty and their balancing are explored in the competence pillar, and among others the deep coupling of temporal neural networks with Bayesian methods for reliable prediction is investigated.

Time series data are omnipresent in the overall project

Although time series data is not always directly obvious, from a method perspective, it is almost always useful to identify temporal relationships in the underlying data and information. Often, additional temporal intercorrelations are hidden in the data that should be profitably exploited for resolution. The number of scientific contributions to the competence pillar show that there is a great interest in time series-based learning methods in both methodological and application-centered research communities.

Project PROSPER: »Structural Framework for Time Series Forecasting with Neural Networks in PyTorch«

Applicability analysis has shown that recurrent neural networks RNNs (especially historical consistent neural networks, HCNNs) offer great potential for industrial and macroeconomic applications on time series data, as they provide higher forecasting quality compared to other state-of-the-art techniques. RNNs, especially HCNNs, have been able to show their advantage in some price forecasting applications (electricity, copper, steel price forecasting, etc.) over »no-risk scenarios« where the same targets are processed using shorter time ranges. The quality of the forecast is improved since the future descriptive features are used in the prediction of future time steps. The quality of the orecast is improved since the future descriptive features are used in the prediction of future time steps.

However, since many of these applications have been developed as part of industry projects rather than research activities, evaluation with the necessary scientific accuracy of publicly available datasets has been lacking. In scientific discourse and in a series of tests in real-world projects, three major problems have emerged that have prevented a broad and successful use of HCNNs so far: the selection of the optimal architecture (e.g., a priori feature extraction), the reliability of uncertainty estimation of the models, and the comparison with prominent state-of-the-art and scientific methods. Researchers in the competence pillar adressed these challenges in a cross-campus research collaboration.

»ADA wants to know« Podcast

In our new podcast series, »ADA wants to know«, the people responsible for the competence pillars are in conversation with ADA and provide insight into their research priorities, challenges and methods. In this episode, listen to ADA with Automated Learning expert Christopher Mutschler.

Our focus areas within AI research

Our work at the ADA Lovelace Center is aimed at developing the following methods and procedures in nine domains of artificial intelligence from an applied perspective.

Automatisches Lernen
© Fraunhofer IIS

Automatic learning covers a vast field that ranges from automated feature recognition and selection for datasets, model search and optimization, or automated evaluation of these processes through to adaptive model adjustment using training data and system feedback. It plays a key role in areas such as assistance systems for data-driven decision support.

Sequenzbasiertes Lernen
© Fraunhofer IIS

Sequence-based learning concerns itself with the temporal and causal relationships found in data in applications such as language processing, event processing, biosequence analysis, or multimedia files. Observed events are used to determine the system’s current status, and to predict future conditions. This is possible both in cases where only the sequence in which the events occurred is known, and when they are labelled with exact time stamps.

Erfahrungsbasiertes Lernen
© Fraunhofer IIS

Experience-based learning refers to methods whereby a system is able to optimize itself by interacting with its environment and evaluating the feedback it receives, or dynamically adjusting to changing environmental conditions. Examples include automatic generation of models for evaluation and optimization of business processes, transport flows, or control systems for robots in industrial production.

Few Labels Learning
© Fraunhofer IIS

Major breakthroughs in AI involving tasks such as language recognition, object recognition or machine translation can be attributed in part to the availability of vast annotated datasets. Yet in many real-life scenarios, particularly in industry, such datasets are much more limited. We therefore conduct research on learning using small annotated datasets in the context of techniques for unsupervised, semi-supervised and transfer learning.

For several years, we have seen unbridled growth in the volume of digital data in existence, giving rise to the field of big data. When this data is used to generate knowledge, there is a need to explain the ensuing results and forecasts to users in a plausible and transparent manner. At the ADA Center, this issue is explored under the heading of explainable learning, with the goal of boosting acceptance for artificial intelligence among users in industry, research and society at large.

Mathematical optimization plays a crucial role in model-based decision support, providing planning solutions in areas as diverse as logistics, energy systems, mobility, finance, and building infrastructure, to name but a few examples. The Center is expanding its already extensive expertise in a number of promising areas, in particular real-time planning and control.

Semantik
© Fraunhofer IIS

The task of semantics is to describe data and data structures in a formally defined, standardized, consistent and unambiguous manner. For the purposes of Industry 4.0, numerous entities (such as sensors, products, machines, or transport systems) must be able to interpret the properties, capabilities or conditions of other entities in the value chain.

Few Data Learning
© Fraunhofer IIS

We use few data learning to address key research issues involved in processing and augmenting data, or generating sufficient datasets, for instance in AI applications using material master data in industry. This includes processing flawed datasets and using simulation techniques to generate missing data.

Das könnte Sie auch interessieren

Positioning and networks

In the research area "Positioning and networks" at Fraunhofer IIS, there is further research and application examples for the described competencies and methods.

What the ADA Lovelace Center offers you

 

The ADA Lovelace Center for Analytics, Data and Applications offers - together with its cooperation partners - continuing education programs around concepts, methods and concrete applications in the topic area of data analytics and AI.

Seminars with the following focus topics are offered: