We’re more reliant on software than ever before. As more enterprises go digital, the demand for software solutions increases. Being so ubiquitous, it’s easy to forget how complex software can be and why having airtight software engineering is so important.
Software engineering approaches the development process using engineering principles. The focus is on creating robust development architecture, using the best tools for the project, and working within established parameters to meet client expectations.
When Software Engineering goes wrong
As technology becomes more widely adopted, the need for well-designed and secure software systems increases. When sound software engineering principles aren’t prioritized, bugs and design flaws can slip through the cracks. Software failures can manifest as mildly annoying feature disruptions, unexpected hardware behavior, or even crippling system crashes.
The costs of software failures can be devastating. This is why monitoring software quality, tracking bugs, and testing are so important in software engineering. Here are 5 of the biggest software failures––and what you can learn from them.
Therac-25 Radiotherapy Bug
Radiotherapy is a common treatment for certain cancers. Radiotherapy machines deliver precise doses of radiation to destroy cancerous growths. The Therac-25 was one such machine.
The Therac-25 used software controls to manage the radiation output during treatment unlike its predecessors which included hardware interlocks for safety. This proved fatal when a race-condition bug in the software resulted in several deaths from radiation overdose.
Investigation into the software failure found the Therac-25’s control system had several glaring flaws. These included insufficient timing analysis, unit-testing, and fault-tree analyses.
Boeing 737 MAX MCAS
In 2019, two Boeing 737 MAX carriers crashed, taking the lives of 347 passengers in total. Boeing lost $18.4 billion from grounded flights, lawsuits from bereaved families, and cancelled orders.
Investigations uncovered several flaws in Boeing’s patented Maneuvering Characteristics Augmentation System (MCAS). One major oversight was the MCAS didn’t account for external events affecting the sensors located outside of the plane.
While the 737 Max has two angle-of-attack sensors, the MCAS in use only took data from one of them. Failure or malfunction of that sensor would have caused the MCAS to “correct” the perceived course of the craft. This forced both planes into nose dives mid-flight and restricted the pilots’ ability to correct course.
Two of the key bugs in the MCAS were its:
- inability to register information from both sensors
- security measures overriding manual control
Had the software engineers understood the realities of aircraft operation and the need for pilot control, perhaps these issues could have been addressed in production.
Fastly Internet Outage
Fastly offers edge computing solutions to help large websites deliver rapid connections to users globally. In June 2021, a Content Delivery Network (CDN) bug crashed service to major websites including The New York Times, Amazon, and Reddit.
The cause? An undiscovered bug in a recent software update.
When a Fastly customer attempted to update their CDN configuration, it triggered the bug causing disruptions in communication across Fastly’s data centers.
This incident underscores the importance of thorough unit-testing, bug detection, and code quality review throughout the development process.
Log4J Security Exploit
One of the most serious software engineering fails of 2021 was the Apache_Log4J 2 exploit, called Log4Shell. Log4J is a popular library used primarily to track activities or errors that occur during regular use of applications.
On its own, Log4J isn’t malicious, but it can easily be exploited via text messages, giving attackers access to users’ computers––and their data.
Since the discovery of the Log4Shell exploit, companies and service providers around the world have been scrambling to fix their vulnerabilities. Apache has released patches to fix the vulnerability since then, but with affected machines in the hundreds of millions, the risks are still high.
Kyoto University Data Disaster
Innocuous-seeming software updates can often cause big problems, like functional failures, key software changes, or data loss. In the case of Kyoto University, a faulty bit of code resulted in the loss of 77 Terabytes of critical research data. Research from 14 different groups, amounting to over 34 million files were lost.
To prevent further data loss, Kyoto University scrapped its backup program and is now actively working to restore functionality. Fortunately, most of the files were recoverable, but for four unlucky research groups, the data is likely gone forever.
The service provider, Hewlett Packard Japan publicly apologized, taking responsibility for the incident. They also confirmed that they will be working to ensure such incidents don’t happen again. However, their credibility has certainly taken a hit in Japan.
What's The Takeaway?
Throughout the software development process, software engineering principles need to be implemented. These can ensure security, stability, and reliability not just of new features, but of existing products. It is especially important that software engineers continuously test and re-test code and software configurations in a variety of environments.
Measuring individual developer contributions and reviewing code quality are also necessary to help catch bugs in development before they cause harm.
You can learn more about KPIs for measuring your software development process here.
About the author
Juan Pablo González
Working as Foreworth’s Chief Technical Officer, Juan Pablo (JP) manages the company’s technical strategy. With nearly 20 years of experience in software development, he ensures the development process at Foreworth is meeting its keys objectives and technical requirements.
More info →
What do you think? Leave us your comments here!