Despite advances in technology, systems reliability is getting worse

Warning: Undefined array key "bio" in /home/techwatch/domains/ : eval()'d code on line 13


Reading time: 11 minutes

With a steady increase in global product recalls, highly publicized fines for automotive companies and the recent Boeing tragedies, it’s clear that reliability is a major issue. According to world-renowned expert Michael Pecht, things are getting worse. In fact, he says many of these problems are completely self-created and entirely avoidable – that is, if companies are ready to get serious about reliability.

“Mike. Buddy. Please. Can you please retract your story? You know, it might be possible for us to fund some of your future research.” This was an offer from the CTO of one of the major players in the field of laminates manufacturing for PCBs. Unfortunately for him, Michael Pecht had just published a report about a faulty automotive engine control unit that used the PCBs the company produced. Pecht, citing the CTO again: “Besides, we buy from over twenty glass fabric vendors and they buy from numerous other companies – mostly in China – and they don’t know or monitor those suppliers.”

The problem for this CTO: hollow glass fibers used during manufacturing were causing leakage currents and intermittent shorts. In some cases, the current and resulting temperature could be high enough to cause a fire. Naturally, they wanted to mitigate the potential damage caused by the release of the report, even if they had to cough up some money to make it go away.

Much to the CTO’s dismay, however, Pecht wasn’t interested in his money – this was about ethics and a passion for his work. “Well, I appreciate the offer, but no. And, in fact, I’m going to publish what you just said.” And that’s exactly what he did. “I’m actually certain that they’re still using many of the old outdated standards and handbooks today and as a result, it’s pretty clear to me that they know little about reliability,” he quipped, reflecting on the conversation.

Recalls and controversies

Professor Michael Pecht is certainly no stranger to this sort of tête-à-tête. He’s a world-renowned expert on systems reliability engineering and testing. Pecht is the founder and director of the Center for Advanced Life Cycle Engineering (CALCE) at the University of Maryland. Established in 1985, it was the first academic research facility in the world to receive ISO 9001 certification, which is the international standard for quality management systems. Today, CALCE is regarded as one of the founders of the physics-of-failure approach to reliability engineering, offering accelerated testing, electronic parts selection and management, as well as supply chain management to its membership of more than 150 of the world’s leading technology companies. In Pecht’s time with the center, it has continued to grow, employing more than 100 faculty, staff and students, and generating a revenue of six million dollars.

In his work, Pecht has influenced many reliability engineers all over the world. “Mike Pecht set the pace in the area of prognostics and health monitoring (PHM, CA) of electronic devices and systems,” says Willem van Driel, responsible for solid-state lighting reliability at Signify, as well as a professor of reliability at Delft University of Technology. “Without him, PHM would not exist.”

Clemens Lasance, a former principal scientist at Philips, recalls, “When I initiated the electronics cooling activity within Philips CFT in 1984, it was clear that our objective shouldn’t be to calculate, measure or predict temperatures, but their relation to reliability, safety and performance. I then became familiar with Pecht’s publications and I went to visit CALCE.” Later, Philips became a supporting member and adopted CALCE’s physics-of-failure philosophy as the basic approach used by CFT’s Reliability Group. “In short, Pecht’s way of thinking has had a big influence on the way Philips tackled reliability issues,” Lascance adds.

Because of his contributions and expertise in the world of high-tech systems reliability, Pecht was invited as the keynote speaker for the 2019, Trends and Challenges in Reliability seminar in Eindhoven, where he sat down with Bits&Chips for an interview.

In your story on the hollow glass fibers, you highlight how complex the supply chain can be and how markets are becoming more globalized. How does culture influence systems reliability?

“Cultural differences play a major role in the realm of reliability. A different perspective on issues like reliability, safety and liability can really blur the boundaries. In some places in the world, the sale of counterfeit goods is an accepted and very lucrative business. Even though the world has become more globalized, the lack of international standards for products is a big problem. Countries like China or India can set their own standards, which means that supply chains have become increasingly more complex. What we’ve found is that companies are continually neglecting to monitor these suppliers, and as a result, there have been some very costly failures. It’s never been more important for a business to keep an eye on their suppliers because no amount of cost-cutting measures will offset the potential fines.”

As technology advances, what’s the trend that you’re seeing in reliability? Are things improving?

“No, quite the opposite. It’s apparent to me that, in terms of reliability issues, things are getting worse. I believe this is evidenced by the continuous growth in recalls and controversies surrounding even the world’s best-known companies – which in some cases have resulted in penalties in the billions.”

For decades, conventional wisdom said that there were various ratings of electronics components, spanning from aerospace and automotive to computer or consumer grade. During this time, the military was known as the worldwide leader in electronics and their products were rigorously tested under extreme environments and circumstances. If the component could withstand this testing, it would receive the military-grade stamp – the highest possible rating.

“Today, you’d just need to open your Samsung or iPhone to see where high technology really is. This sounds great, but it can also be problematic. What has happened is that the companies that used to make military and aerospace components decided not to anymore because it was simply too expensive, especially when consumer goods don’t require such stringent testing. Suddenly, the military found that they have huge obsolescence problems; they just can’t buy the parts any longer.”

Buyer beware

Attending one of Pecht’s lectures on systems reliability, one of the first topics that you hear him speak about is the importance of ethics training. To him, this is an important part of education, both at universities and the workplace. “I think that’s one thing that needs to be more of a priority, especially at companies. If there’s a problem, especially one that could potentially be serious, employees need to be able to not only confront their boss, but perhaps others as well. They absolutely need to have training in that regard.”

Over his three-plus decades of experience in the realm of electronics and systems reliability, Pecht has gained a unique perspective on the way reliability issues arise in various companies. In his mind, this is frequently the result of higher-ups in a business making knee-jerk reactions to claims from the competition. The message of, ‘We need X because our competitors are doing Y and we’re falling behind,’ or, ‘We’ve got to solve this problem and it’s got to be done by next week’ is one that is all too often heard. At times it can even be, ‘Solve this problem any way you can.’

“The problem then becomes the immense strain coming from the top-down. When people are working under this sort of pressure they sometimes think, ‘I can’t solve it technically, so I’m going to have to solve it in some other manner,’ and this is where we often find that some sort of fraud takes place. For instance, changing software and applying quick ‘fixes’ of that nature, rather than really addressing the underlying issues. It creates this sort of buyer beware environment. And that’s exactly it – we do have to beware.”

Do you attribute these issues of fraud to cost-cutting and the race to the market?

“That’s not the only reason, but yes, in many instances this is the case. A good example of this is the current situation with the Boeing 737 Max. What seems to have happened is, Airbus announced they had developed a new engine design that was more efficient and offered a much better fuel economy. In a rush to avoid losing sales, Boeing almost immediately announced that they, too, were going to go in the same direction and would be using a similar engine design.”

Unfortunately for Boeing, this ‘new’ model plane is a modified version of the 737-100, which came to fame in the late 60s during the Cold War as a short destination commuter jet and was designed fundamentally different to its rival, Airbus. Boeing specifically intended for this plane to sit lower to the tarmac to allow foldable stairs to reach the ground, making entry and exit more efficient. Airbus’s plane, however, was built to sit much higher off the ground. For this reason, the same low-to-ground engine used by Airbus couldn’t effectively fit under the wing of the 737, which meant that the placement of the turbine had to be adapted.

“This alteration of the engine positioning threw the whole system out of balance and affected the whole aerodynamics of the plane. As a speedy fix to this new problem, which was completely self-created, Boeing decided to implement new software to address the issue. But then, the software had to rely on a sensor, which apparently had no redundancy. In order to overcome this lack of redundancy, their view was simple – the pilot will take care of it.”

“You can see how it just continued to snowball. Each of these issues was stacking up, and the decisions in response were becoming more questionable. There was absolutely no focus on reliability or testing; it was simply a race to the market, even when being met with matters that were extremely critical.”

This is particularly troublesome because it’s well-known that Boeing has been making cost-cutting maneuvers in order to increase shareholder value. Since 2015, the aerospace juggernaut has cut their workforce by about seven percent, while simultaneously pushing for an increase in production. This includes a cut of fifty percent to the Flight Crew Operations team, which is directly responsible for managing pilot interaction with the airplane’s software and system controls – the very issue that is now thought to have wreaked havoc on the ill-fated Ethiopian Airlines and Lion Air flights.

Profit seems to be the driving force behind a lot of your examples of product failure, some of them costing corporations billions of dollars. What then is the benefit of a company to keep these issues quiet, or to apply quick fixes, if it’s still resulting in such penalties?

“Well, I don’t know exactly. But it appears that on some level, these businesses believe they’re not going to get caught. You’ve got to remember: these companies are smart and they’re fast on their feet to make changes. I think it’s their view that by making a software change to address a problem, people aren’t really going to notice.”

“While this may have been true twenty years ago, the same can’t really be said for today. In this era of mass production, there’s always somebody who will do something crazy as soon as a new device comes out. Whether it’s the ‘bend test’ on a new phone, or putting new electronics in microwaves, blenders or smashing them with hammers – customers will go to extremes to push the limits of today’s technology. Consumers nowadays are also more tech savvy and more willing to dive into the systems and processes, which is why I think there’s a greater chance than ever that these reliability issues will be discovered.”

An example that Pecht often uses is the Volkswagen emissions scandal. VW, one of the largest and most-respected auto companies in the world, was found to be admitting nearly forty times more NOX than they claimed. By implementing a ‘defeat software’, the car’s computer could determine when the vehicle was being tested rather than being driven. During tests, it would enable the reduction of emissions so it would register below the Environmental Protection Agency’s requirements. Of course, the corporation denied these accusations for more than a year, but ultimately, they were punished to the tune of 2.4 billion dollars in fines. It was third parties, not Volkswagen and not the government, that discovered the cheat.

If a company sees the benefit and is serious about testing and reliability, what’s the proper balance? They obviously can’t be testing indefinitely, so what would your advice be?

“Well, you’re asking the right question, but there’s no uniform answer. I think you would start by hiring a consultant or group to do an expert return on investment analysis, and then use this to make that important decision. There’s no rule of thumb of a certain number of engineers or a specific percentage of the budget. The real answer to this question is that you must first determine your risk of failure, and, of course, what the consequences could be. If you’re making bicycle parts, it’s totally different than something with medical applications.”

It seems rather difficult to put a value on the money saved through reliability. In software or hardware testing, you can make a direct relation to the number of bugs that are found. But how does one account for that in reliability?

“That’s exactly right, it is difficult. In my view, a good reliability person needs to explain what the benefit of their work will be. Somehow, they need to be able to convey the message that, if reliability isn’t done, there’s a very good chance that the company could end up paying tens of millions or hundreds of millions or billions of dollars.”

“I don’t actually encourage anybody to necessarily go into the field of reliability. It can be a very thankless job, and nobody really loves you. You’ll find yourself routinely saying, ‘Let’s delay, we need to do more analysis or testing,’ and that can be trying. You really need to have a resistance to stress.”

Where do you see the future for engineers in the field of systems reliability?

“I hope to see the continued growth of the physics-of-failure approach to really determining and understanding the actual failure mechanisms in a system. Additionally, over the last several years, we have begun working with companies like Dell to analyze and understand individual types of components, the way that they operate and what causes them to fail and we have developed a conference on prognostics and systems health management. And of course, putting artificial intelligence into our products to enable them to forecast problems is a key to future product reliability and safety.”

“It might be easier to think about this in terms of people. Every person is different, each with their own diet, habits and activities. What we’d like to do is put a doctor, so to speak, on the shoulder of each type of person. To get a full measurement and analysis of the health system of the individual and how it operates in a larger environment with others. Then you can really understand how a system operates, develop an in-depth profile, sense anomalous behavior, prevent failures and forecast maintenance as needed.”