Using AI to play a serious game of supply chain optimization

Nieke Roos
Leestijd: 6 minuten

TUE researcher Willem van Jaarsveld is working on software to optimally control the supply of spare parts in high-tech. He got inspired to use reinforcement learning by the computer program that defeated the human world champion in the board game go.

“Every hour a wafer scanner is idle costs a chipmaker millions. That’s why it’s crucial for a company like ASML to have a replacement part on-site as soon as possible when something breaks in one of its systems,” explains Willem van Jaarsveld. “But it’s just as important to get the supply chain back in order so that there are no supply problems if the same part is needed somewhere else later on.”

Van Jaarsveld is a process optimization expert in the field of maintenance logistics. He’s an associate professor at Eindhoven University of Technology, in the Operations Planning Accounting & Control group (OPAC) of the Faculty of Industrial Engineering & Innovation Sciences. In addition, he’s a research director at the European Supply Chain Forum, a TUE-affiliated competence center that collaborates with the Eindhoven Artificial Intelligence Systems Institute (EAISI) and the High Tech Systems Center (HTSC). In both roles, he conducts research into the optimal management of the high-tech supply chain using machine learning techniques.

Reinforcement learning

“How can you ensure, at the lowest possible cost, that a machine is operational almost 100 percent of the time?” says Van Jaarsveld, outlining the problem he’s working on – a problem that quickly becomes very complex. “High-tech often involves a lot of different, complex parts. You don’t want to have a lot of these parts in stock because they’re so expensive. It’s also simply not possible because suppliers only have a limited capacity to build them. The users of the machines are all over the world, so your warehouses are too, and crucial parts often make a precautionary trip to different customers before they’re actually needed somewhere.”

Van Jaarsveld continues: “Such supply chains are highly dynamic: every so often you receive new information and have to deal with unforeseen events. Nothing ever goes completely according to plan. The large, global scale also makes it extremely difficult to keep track of everything. Traditional optimization strategies fall short here, but it’s right up the alley of artificial intelligence.”

No wonder, then, that AI researchers have enthusiastically thrown themselves into the problem. Initially a little too enthusiastically to Van Jaarsveld’s taste. “In the early days, everyone thought everything could be solved with artificial intelligence. In maintenance logistics, too, I’ve seen all kinds of techniques being applied almost blindly. Just connect your ERP system to AI and it will magically learn all sorts of clever tricks. I don’t believe in that. These techniques need tens of millions of inputs for sensible pattern recognition. In the highly dynamic practice of logistics, you’re lucky to have a thousand.”

This didn’t stop Van Jaarsveld from working with AI himself, specifically with a technique called reinforcement learning. “Google Deepmind has used this to create software that has taught itself to play all sorts of difficult games. I’d been looking for a long time for methods that were better suited for real-time control of logistics chains. When I saw one of those Deepmind programs, Alphago, defeat the human world champion in go, I knew right away: reinforcement learning is the method. Starting in 2018, I went all-in on this research direction.”

An example of an agent playing the game. After several failures in a row, customers in Taiwan and Hong Kong no longer have a spare of a specific part in a nearby warehouse. The agent decides to bring that part to Singapore because from there, both locations can be reached quickly. Credit: Willem van Jaarsveld

Serious game

In reinforcement learning, a so-called software agent learns good strategies in interaction with an environment, usually a simulation. The program does this by performing an action for which it’s rewarded or punished. If it’s rewarded, it knows that it’s on the right track and it can build on that; if it’s punished, it has to change course. By trying out a lot, the agent learns which moves are smart in which situations. With the aid of a neural network, he can generalize the strategy to circumstances he didn’t encounter in the simulation.

Van Jaarsveld applies this approach by turning a challenge from the logistics practice into a so-called serious game, developing an agent for it and training the agent in the game. The game consists of a set of rules (the dynamics), the actions that the agent can perform and a reward (or punishment). The game rules are iterated through until the game is a sufficiently accurate representation of reality. At that point, the agent is ready to step into the real world and make real decisions.

“In a typical high-tech supply chain, the playing field consists of customer sites, where the machines are located, and warehouses, holding the parts for those machines,” Van Jaarsveld gives as an example. “Worldwide, the machine supplier has a few strategic warehouses. These exchange goods with each other and distribute them further to more regional forward warehouses, which directly supply the customers. The replacement parts enter this network at the central warehouse. From there, they’re distributed around the world via so-called transshipments.”

“Based on the number of parts that suppliers can produce per year and the frequency of failure of those parts, derived from the mean time between failures and data from the installed base, you can now play the game. You simulate all kinds of problem situations and each time, you look at how many transshipments are needed to get the chain back in order. You start simple, but then you add all kinds of complexities, eg that you prefer to ship proactively rather than last-minute because that’s eight times as expensive, or that one of the customers must be given priority because you need to make up for an earlier slipup.”

Willem van Jaarsveld uses reinforcement learning in his research on real-time control of logistics chains. Credit: Angeline Swinkels


Commissioned by ASML, Van Jaarsveld has developed this idea into a proof of concept, delivered in 2019. Together with TUE colleague Yingqian Zhang, master’s student Valentin Dmitrochenko and ASML project leader Douniel Lamghari-Idrissi, he compared the results with an approach developed by the company itself. Van Jaarsveld: “At ASML, they were reasonably satisfied with their own work, but they also had the feeling that it could be improved. And that feeling proved justified: our method yielded considerable additional savings. Impressed by the fact that we’d beaten their approach and that we could also explain how, they agreed to join a big follow-up project.”

This Dynaplex project started last August under the leadership of Van Jaarsveld, Zhang and a third TUE colleague, Remco Dijkman. With a subsidy from the Top Consortium for Knowledge and Innovation Dinalog (the Dutch Institute for Advanced Logistics), they’re going to work towards full-scale application at ASML over the next four years. “With the master’s student, we looked at a few components and about five warehouses; now we want to scale up to the entire chain and demonstrate that it works in a simulation environment. Then we can move into practice.”

In addition to the high-tech supply chain, Van Jaarsveld sees other promising use cases. “Scheduling in the semicon front-end, for example. There, wafers make several rounds through different machines. These schedules are drawn up for a longer period but sometimes they can already be thrown out after just a few hours. Although we’re still at the stage we were at two years ago with ASML, we’re hopeful that we’ll be able to determine dynamically at each step the optimal route for each wafer.”

“Our approach is broadly applicable,” states Van Jaarsveld. “It used to take a lot of puzzling and sometimes years of research to tackle a specific problem, but our algorithms are very easy to hook up to various cases. It’s as simple as plug-and-play. I see a lot of potential there for the future. In the projects we’re doing now, we’re also really trying to work on a plug-and-play architecture so that other companies will eventually benefit as well. Companies in transport logistics, for example, such as Ewals Cargo Care and Koninklijke Den Hartogh. As participants in Dynaplex, they hope to reap the benefits of the project later on.”

This article was written in close collaboration with Eindhoven University of Technology’s High Tech Systems Center. Main picture credit: Angeline Swinkels

Related content