For a highly complex machine like the Twinscan to be able to operate smoothly, its system control should run without any unnecessary interruptions. Within the Concerto project, ASML, ESI (TNO) and TUE have developed a model-based methodology to analyze the software execution and keep computational tasks out of the critical path as much as possible. The partners see great potential for the approach to be widely adopted in the high-tech industry.
In ASML’s lithographic systems, the Twinscan stage simultaneously moves two tables, each holding a silicon wafer. While one wafer is being exposed to – deep or extreme – ultraviolet light containing the chip pattern to be printed, the other is measured by the machine’s metrology sensors to optimize alignment. The tables are propelled electromagnetically, allowing frictionless acceleration as high as 7G.
Every move the Twinscan stage makes has been precisely calculated by the system’s software. To ensure a smooth journey from A to B and prevent a wafer table from missing a turn, the computations need to be completed in time. “Imagine you’re on the highway, following the instructions of your navigation system,” ASML’s Jos Vaassen makes a comparison. “If the system takes too much time to calculate the route, you’re going to drive right past your exit. Likewise, we don’t want our scanners to miss a turn because our software is missing a deadline.”
As the chip patterns to be printed continue to shrink, however, the lithographic scanner grows ever more complex, requiring an increasing amount of computations to get the job done. This raises the likelihood of missing a turn and having to stop for some time to recalculate and get back on track. Such an interruption has widespread consequences. For example, it affects the focus of the lens system and the alignment of two subsequent chip layers. Ultimately, it will impact the machine’s performance.
To prevent that from happening, the computations, realized in software, need to be continuously monitored. With that goal in mind, ASML and TNO’s high-tech joint innovation center ESI set up the Concerto project in 2016, together with Eindhoven University of Technology (TUE). In four years, they developed a model-based methodology to diagnose, predict and optimize system timing and throughput and to keep computational tasks out of the critical path as much as possible.
“A Twinscan machine contains a great number of software components,” explains Jeroen Voeten, professor Cyber-Physical Systems at TUE and initiator of the Concerto project when he worked as a research fellow at ESI. “All those components are doing their share in completing the computational work at hand. Finding out which of them is causing the delay when the calculations are taking too much time is a daunting task, almost impossible to do manually because of the sheer number of components. The tooling developed in Concerto makes it possible to quickly pinpoint the root cause. Knowing this cause is the first step in fixing the problem.”
A delay in machine execution can be due to a simple software error. A problem like that is generally easy to fix, says Joost Gevers, ASML’s software product architect responsible for the installed base of NXT systems in the field. More challenging are delays caused by a tight processing budget, ie when the system’s to-do list is pushing the limits of the available compute power. “When timing budgets aren’t met, we could remove some of the computational tasks from the critical path by executing them earlier.”
Fellow software product architect Vaassen, during Concerto responsible for the installed base, emphasizes that fixing the problems was outside the scope of the project. “The tooling developed focuses on finding the critical path and the components on it. Once that has been mapped out, it’s up to the engineers to resolve the performance bottlenecks.”
“We’ve developed a model-based approach to do the root cause analysis,” recaps Bram van der Sanden, ESI research fellow and liaison between ASML and TUE. “The tooling constructs an overview of the system’s execution over time, showing which component is performing which task at what moment. By analyzing this execution, it can then show the locations of the bottlenecks. This provides insight into the design changes that need to be made to ensure a smooth operation.”
The approach developed within Concerto starts with collecting data about the system execution. “We do that by instrumenting the machine software at strategic locations, ie by adding little bits of measurement code there,” clarifies Van der Sanden. “For every component, it allows us to track the starting and stopping times of the functions being executed and the messages being passed to other components. This gives us the traces we need to automatically create the formal behavioral models with which we can analyze the system’s performance.”
“We’re not instrumenting components individually,” underscores Van der Sanden. “We’re instrumenting function interfaces and the middleware through which they talk to one another.” TUE’s Voeten adds: “Because ASML has a nice component-based architecture with a decent middleware layer, that communication is readily accessible for automatic instrumentation. That’s very important – doing it by hand would be too much work and it would be very hard to get the required information out.”
“As we’re looking to find timing bottlenecks, it’s also key that we don’t interrupt the realtime performance ourselves,” Voeten goes on to point out. “So we’ve made sure the instrumentation is as non-intrusive as possible, having negligible impact on the system’s operation. Thanks to a very efficient implementation, it’s now ready to run on systems in the field.” ASML’s Vaassen: “We can’t have our customers losing productivity because we’ve added instrumentation to our software, which is why we really took our time to minimize the impact.”
From the log data, the Concerto tooling generates so-called timed message sequence charts. These are Gantt-like diagrams, plotting the software components against the functions they execute over time, supplemented with arrows depicting task dependencies. “The Twinscan’s highly repetitive work, for example, is clearly reflected as frequently reoccurring task groupings,” illustrates Voeten. “The charts map out the complex interplay between components in an insightful way. They perfectly fit ASML’s architecture, and software architects and engineers are already used to working with them in the system specification phase – they’re very common in design documentation. Contrary to standard practice, however, we’re generating them, after the fact, from execution data.”
The timed message sequence charts provide the formal foundation for the final step of automated performance analysis. “We can apply different mathematical techniques to them,” notes Voeten, “not only to calculate the critical path and find the root causes of bottlenecks but even to formally verify system properties.”
The tooling is currently being industrialized at ASML. “We haven’t fully deployed it yet but in the pilot phase, it has already helped us uncover a couple of bottlenecks,” states Vaassen. “For example, when two tasks communicate, they can do so on a fire-and-forget basis: they send each other messages and continue their business without waiting for a reply. When the message is too big to be conveyed in one go, however, it gets chopped up into multiple parts, which the sender cannot just fire and forget anymore; it has to wait for a reply before it can send the next part. Thus, fire-and-forget can still result in a task being blocked. Since the interface for sending messages is abstract, this isn’t visible in the code. Concerto has really opened our eyes to these kinds of potential problems.”
“We don’t have to rely so much anymore on good fortune and in-depth knowledge to find a bottleneck in a day or two,” summarizes Vaassen what he sees as the project’s main added value. “Without having to dive into the design documentation to determine the exact configuration, we can get an overview of what’s going on in a system. The tooling can just generate that by looking at the execution. It accelerates problem-solving.”
Vaassen’s colleague Gevers concurs: “By giving us the complete picture, it allows us to more easily pinpoint performance bottlenecks. We have an excellent proof of concept, showing that it really works. I’d like to see ASML invest big in rolling it out to the company’s entire software community – with the ultimate goal of using it to analyze execution data collected in the field and fix issues at customers in a heartbeat.”
To Voeten, the project marks a major milestone for the model-based paradigm. “For decades, we’ve been trying to get the industry to create models for specification and code generation – to little avail. With Concerto, we’ve moved to automatically generating them from complex systems – the right side of the V model – and we’re already gaining traction. We’ve managed to connect twenty years of academic research to the high-tech practice. Bringing the ability to efficiently analyze millions of lines of code in a day, I think this has real potential of catching on.”
ESI’s Van der Sanden, too, sees great benefits for the ecosystem. “We’ve developed a model-based methodology to quickly and systematically assess the impact of timing variations. We intend to open it up to other companies. Those with a similar component-based architecture, such as Philips or Thermo Fisher Scientific, could benefit from it as well.”
Meanwhile, the Concerto partners have teamed up once more in a follow-up project, called Maestro. “We started at the end of 2019, again for four years,” tells Van der Sanden. “One of the aspects we’re working on is raising the abstraction level and going from software to system tasks. By enriching the generated Concerto models with multidisciplinary domain knowledge, we want to be able to do a machine-level diagnosis, pinpoint the bottlenecks in system functionality and then zoom in and run a root cause analysis on the associated software tasks.”
Collaborators also want to loop back to the left side of the V model, the system specification. Returning the message sequence diagrams to their natural habitat, so to speak. “After that abstraction step to describing system activities, we’re looking to take it one step further, to specifying system behavior and using that in the development process to make predictions,” philosophizes Van der Sanden. “That would allow us to ask questions like: what would be the impact on system timing if we were to change the order of tasks or even add some new computations? Would that require us to add more processing power or are there other ways to keep the system on track? Being able to run what-if scenarios like this is another long-term objective of Maestro.”