Cost-efficient defect detection in lightweight metal castings using machine learning

Why use AI for non-destructive quality assurance in a production context?

In this project, we focus on the non-destructive detection of defects in lightweight metal castings, specifically automobile tire rims. The common automated quality assurance systems for such components use X-ray images and algorithms based on classical image processing methods. Although these methods are generally effective in detecting many relevant defects, their implementation requires time and specialized personnel for setup, especially for new components. AI-based methods offer the potential for more flexible solutions, particularly in situations where skilled experts are scarce or frequent recalibrations are necessary due to product variety. Our joint project with the Development Center X-ray Technology at Fraunhofer IIS (EZRT) aims to develop an AI capable of accurately detecting defects in X-ray images of castings.

X-ray image of a rim with annotated defects. Source: Schön, T., Gosswami, B., Hvingelby, R., Suth, D., Kemeter, L. M., & Sierak, P. (2022). Automated defect recognition in X-ray projections using neural networks trained on simulated and real-world data.

Data efficiency as a key for AI applications in production

The acquisition and preparation of real data are crucial for the success of AI applications in production but often represent a significant cost factor. In the case of lightweight metal rims, manual detection and labeling of defects based on images by an expert require a lot of time and resources. Since conventional AI systems for image recognition require a large number of labeled data for training and evaluation, significant costs are involved. Our goal in this project was to develop an AI with minimal data requirements. To achieve this, Fraunhofer EZRT has developed its own simulation pipeline for X-ray images, which allows the generation of synthetic training data in large quantities and at negligible costs.

Strategic approach to AI development with limited resources

Like most specific AI applications in the industrial context, resources for AI development are limited. This is a significant difference from well-known base models like ChatGPT, for which no costs and efforts were spared during development. A key question in such cases is how to best allocate the limited budget to achieve good AI performance. In the case of lightweight metal rims, the acquisition and annotation of large quantities of X-ray images alone would have consumed a significant portion of the project resources. Alternatively, we invested a portion of the budget in generating synthetic data. Although the development costs of the simulation tool must be considered, generating mass data is no longer a problem. It is also conceivable to use a mixture of simulated and real data for training the AI.

Comparison of simulated data (left) with real data (right). Source: Schön, T., Gosswami, B., Hvingelby, R., Suth, D., Kemeter, L. M., & Sierak, P. (2022). Automated defect recognition in X-ray projections using neural networks trained on simulated and real-world data.

Can an AI system for defect detection in castings be efficiently trained using only simulated data?

The short answer is no. At best, the simulated data alone would have been sufficient to train a well-functioning AI. An AI that has been trained using only simulated data will perform well on synthetic test data, but will fail when applied to real test data. The reason for this lies in the so-called domain shift between the simulated data and the real X-ray images that are relevant for the use case. In short, this means that despite the similarity between simulated and real data, there are systematic differences that prevent the AI from recognizing defects in real data. Even the addition of a few real images did not achieve the desired results.

Approach for data-efficient AI development

Therefore, it is essential to also use annotated real data. The next crucial question is how to effectively utilize the more cost-effective simulated data and what ratio of real to simulated data should be chosen. In the context of lightweight metal rims, various strategies for AI development and training were compared.
1. Full supervision: Approximately 2000 real X-ray images annotated by experts were available to the project team for training, validation, and testing.
2. Unsupervised domain adaptation: Only non-annotated real images were used, in addition to the simulated data, for training the AI.
3. Semi-supervised domain adaptation: Mostly non-annotated real images, along with a few additional annotated real images, were used for training the AI, in addition to the simulated data.

The latter two strategies require a lower need for annotated real data, which significantly reduces costs. However, they require more development effort as additional domain adaptation approaches need to be implemented.

Do you want to utilize the results for your use case as well?

The problem is not unique, as data and especially their annotations are rarely available in large quantities. Domain shifts, such as from simulated to real data, are also not uncommon in practice. Different camera types or lighting conditions can be sufficient to weaken the performance of an AI model. Project managers quickly find themselves in a situation where budget needs to be allocated wisely. The example of lightweight metal rims illustrates how complex this can be.

In this project, we were able to demonstrate that domain adaptation training strategies are significantly superior to the traditional fully supervised training in terms of data efficiency. It was possible to achieve a similar level of performance as under "full supervision" with significantly fewer or even completely without annotated real data. It is important to note that while costs for annotation are saved on one hand, additional costs are incurred for the development of simulation and methodology on the other hand. Depending on the costs of annotation, development, and acquisition of simulated and unlabeled real data, the optimal allocation of project resources needs to be determined individually.

If you are planning to develop an AI, we would be happy to assist you in making the decision for the optimal allocation of your resources.

That might also interest you

AI-based demand forecasts for logistics, retail and transport

We bring AI-based demand forecasting to logistics, retail and production to improve predictions and quantify forecast uncertainties.

About the research field

ADA Lovelace Center for Analytics, Data and Applications

Competence centre for data analytics and AI in industry

The ADA Lovelace Centre combines AI research with AI applications in industry in a unique way. Here, the partners can network with each other, benefit from each other's expertise and work on joint projects.

To the ADA Lovelace Center