Fail Fast, Ask Questions Later

I recently watched Joseph Anderson (President of the Institute for Process Excellence (IpX)) as the featured guest on Joe Hage’s MDG Premium on the topic of “The state of quality in medical devices.” Joseph mentioned the “Fail Fast/Fix-it Fast” mentality as a key driver of quality issues. This inspired me to write this blog post.

Joseph said the following during the webinar (modified from transcript and audio to be usable in an article):

Failing fast and fixing it once, fine, that’s great! There is nothing wrong with pushing innovation during the development side of a product, process and/or service to rapidly produce a concept. Realizing you didn’t quite hit the mark during testing is no big deal because you are able to make rapid iterations. But what I’m talking about and this is what we see today with quality escapes is that mentality has been used incorrectly.”

If we’re going to fail fast on everything we do, that means your product and your designs are out of control. That means you Fail fast all the way into the start of production and during production you find other issues because you didn’t take your time to review the design properly.  You don’t have the right processes to support it, so you’re releasing poor product in manufacturing and you’re trying to catch all those quality escapes at the end and sooner or later you release a defect to your consumers.”

Quality escapes

So a quality escape is a defect released to your customer/consumer that could have been prevented when applying due diligence during product development. Including having the right processes, design verification, and other quality measures in place during product development instead of having a set of quality checks at the end of manufacturing. If you want to see the amount and range of recalls please check recalls.gov, a website for US consumers with many recorded recalls. For medical devices you could check out the FDA’s recall database. The EU also has recalls website, when I checked for the latest weekly report (week 37 at time of writing) I found 20 recalls that are categorized as products with serious risks. Products range from cars, toys, electrical appliances, protective equipment to cosmetics.

Recalls

Over the years we see more and more recalls, but does that mean that products become less safe? Not necessarily if you look at below graph, you see that deaths per million people in the US and deaths per billion vehicle miles traveled has dropped, even when there was at the same time a significant increase in the population.

Dennis Bratland CC-BY SA 4.0 – Source

Though below graph still shows that the number of vehicle recalls is increasing from around 500 per year in 2000 to around 850 per year in 2019. Also the population size of the recalls is growing from around 20 million vehicles before 2010 to at least 30 to 40 million vehicles in the recent years. So do we have better way to identify required recalls for vehicles to prevent deaths or did cars really become safer to drive? One thing that is clear to me, is that quality issues still exist. Just browse through the various recall databases online and you can find plenty of examples.

 Graph made by Martijn Dullaart, Based on NHTSA 2019 Recall Annual Count

Cost of fixing defects

According B. Boehm: “Finding and fixing a software problem after delivery is often 100 times more expensive than finding and fixing it during the requirements and design phase”, which most likely will be very similar for hardware defects as well. So in that context you could argue for Fail Fast/Fix-it Fast. But if you look at recalls in the medical device industry as reported by the FDA, you will find some concerning information. From 2016, software issues have been the top cause for recalls of medical devices (ranging from about 20 to 28 % for a given quarter, Stericycle Expert Solutions Recall Index). Lisa K. Simone (PhD, a biomedical and software engineer with the Center for Devices and Radiological Health at the U.S. Food and Drug Administration.) explains that in 2006 software related recalls slightly less than 18% and even further back to the late 1980s, software related recalls were just 6%. In itself that is not strange, because medical devices like other products have been using more and more software over the years. One can also ask the question, did software quality improve at all during the last 40 years? Or was all the Agile Scrum/SAFe, Fail fast/Learn fast, DEVOps, etc just one big bubble of air?

Well, I think the answer requires a bit of nuance. Short iterative cycles, to get feedback from users/customers is a good thing, as Joseph also indicated. But often times, complex products do not have one team, but multiple teams working in parallel. That is when things become tricky. Dependencies and interfaces are critical to manage. Understanding when you impact a dependency or interface is critical, but hard to assess if you do not have clearly and concisely documented these interface agreements and dependencies. Add to this the window of opportunity for market introduction of a new product and what you get is a Fail Fast, Ask Questions Later mentality. Not doing due diligence in getting requirements clear or doing proper design verification, etc.

As Lisa K. Simone, states

“Straightforward design and coding defects continue to be observed as causal factors in recalls. These are defects that often could be detected using basic tools known to increase software quality.”

Detect Fast, Fix Fast

When looking more closely at the mantra ‘Fail Fast, Fix-it Fast’, some issues come to the surface that might also trigger the wrong behaviour:
First: the term Fail is incorrectly used, because when you make a mistake, you do not fail when you fix it on time. If you do not fix it, you fail. Making mistakes is not the same as failing, you only fail if you do not learn from your mistakes.

Secondly: you do not want to be fast at making mistakes, you want to detect mistakes when they are made and then fix them or learn from your mistakes as fast as possible.

A customer review of the result of a sprint (in an Agile way of working), is important, but not sufficient. During the entire development process, measures need to be in place to first prevent mistakes from being made, secondly to detect mistakes when they are made, not when the product is shipped, and to fix these mistakes and learn from these mistakes when detected.

Therefor a better mantra would probably be: Detect Fast, Fix-it Fast OR Detect Fast, Learn Fast.

In conclusion

All this leads me to conclude that if Failing Fast means you verify, test, prototype, etc to find and fix issues in short iterative cycles, than I agree that this concept is important. However with complexity and time pressure into the mix, often times Fail Fast, Fix it Fast, will lead to a mentality of Fail Fast, Ask Questions Later leading to quality escapes. This is a shame as CM2 is a process and methodology framework that ensure quality is built in from the beginning. Sure it will take time to mature and transform your current practice to embed quality without losing agility. But the rewards are big, not just for your bottom line, but also ethically. If you could have prevented a quality issue with some basic processes and standard tools in place, you should have prevented it. Ignorance is no excuse!

I want to close with the following brain teaser:

Quality does not come at an extra cost as the intend should always be to deliver per the requirements of the product. There is only extra cost due to non-quality.

Header Photo Grid with Columbia's Debris by NASA - Public Domain - Source

Leave a Reply

Your email address will not be published. Required fields are marked *

I accept the Privacy Policy

This site uses Akismet to reduce spam. Learn how your comment data is processed.