EE Times has a Special Project report out that addresses growing concerns over technology reliability and safety that is pretty sobering. The report, compiled and in good part written by George Leopold, includes a couple of case studies from either end of the technology spectrum: hoverboards and Boeing 737s, as well as some welcome suggestions for what to do about this critical issue.
Leopold begins by asking how we got ourselves into this situation:
Some point to lax regulation, others cite “cultural laziness” borne of buggy software releases that result in endless patches. Have engineering principles like built-in redundancy in mission-critical systems been compromised by market pressures?
It certainly looks that way, a variety of experts tell us.
The Hoverboard Crisis: Junko Yoshida took an in depth look at the hoverboard crisis of 2015, when these new-to-market and wildly popular items began exploding due to lithium-ion battery issues that caused “thermal runaway.” Yoshida sees the problem – which resulted in a recall of half a million hoverboards by mid-2016 – as the result of a “perfect storm of market forces.”
- A new product category with no industry behind it
- An overnight market as interest took off (hoverboards became the “it” gift for the 2015 holidays)
- New market entrants, pressured to design and build quickly (with no standards to draw on)
Interestingly, Underwriter Labs (UL) was in rapid response mode, and within months was able to rollout a hoverboard certification program. Within a year, UL had “issued a ‘consensus’ regional standard for hoverboard safety in the US and Canada. Once manufacturers started adopting the standard, the issues with hoverboards largely went away. (Not covered in Yoshida’s article, but an interesting side note: I read recently that there have been a number of recent instances of e-cigarette/vaping device batteries flaming out during flights. Some airlines have banned them in checked baggage.)
Boeing 737 Max Crashes: Then there were the Boeing 737 Max crashes that resulted in loss of life. These crashes were attributed to malfunctioning software. Again, the culprit was at least in part market pressure. But now the market pressures are swinging in the opposite direction, with many fliers now refusing to book flights on a 737 Max once they’re back in service.
And it’s not just market pressure that let shoddy code out the door. Some see the software development community as having gotten so used to shipping out patches each week that quality has become less important.
And it’s not just software failures causing problems. The FAA has also discovered a potential hardware issue “with the 737 Max flight control computer. The fault reportedly involves the random flipping of bits in the microprocessor, likely caused by radiation striking chip circuitry.” That’s not good.
Nearly 350 people were killed in the two Boeing 737 Max crashes. The upside is stronger regulatory oversight, and on other manufacturers – looking at Boeing’s hefty costs and hits to reputation – will start taking better care.
Let’s hope so. When it comes to reliability and safety, none of us want to leave home without it.
This is the first in a two-part series based on the EE Times/Aspensource Special Project on technology reliability and safety. The next post will focus on solutions.