Problem Solving - Quality - Engineering

Problem Solving - Quality - Engineering

PROBLEM SOLVING · QUALITY · ENGINEERING

I'm Here for the Root Cause.
Not the Excuses.

By grabNade · 8 min read · Engineering Problem Solving

Every engineering failure has two kinds of fixes. The first is fast: replace the part, restart the machine, patch the code. The problem disappears. Until it comes back — in two weeks, or six months, or in a different system entirely. The second fix is slower and harder: find out why it happened in the first place. That's Root Cause Analysis.

RCA isn't a single tool. It's a discipline — a structured refusal to accept the symptom as an answer. And the engineers who practice it are the ones who stop fixing the same problem twice.

## 01. SYMPTOM VS ROOT CAUSE — THE CORE DISTINCTION

A symptom is what you see. A root cause is what created it. The gap between the two is where most engineering time gets wasted.

The classic example: a machine stops on the production line. The symptom is the stoppage. The immediate cause might be a tripped breaker. Replace the breaker and the machine runs again — for now. But why did the breaker trip? Overload. Why was there an overload? A bearing running hot. Why was the bearing running hot? Insufficient lubrication. Why was lubrication insufficient? No preventive maintenance schedule existed for that component.

The root cause isn't the breaker. It's a maintenance gap. Fix only the breaker and the bearing fails completely next month, taking the machine down for a week instead of an hour.

> THE WORKING DEFINITION: A root cause is the deepest identifiable reason for a failure — the one that, if eliminated, prevents recurrence. Not just this incident, but the class of incidents. If fixing it only prevents this specific failure from repeating, you haven't found the root cause yet.

## 02. THE 5 WHYS — THE METHOD EVERYONE KNOWS

Developed within the Toyota Production System and formalized by Taiichi Ohno, the 5 Whys is the most widely used RCA technique in manufacturing, software, and field service. The method is exactly what it sounds like: ask "why" repeatedly until you reach a cause you can actually fix.

Five iterations is a guideline, not a rule. Some problems resolve in three. Complex systemic failures may require seven or eight. The stopping criterion isn't the count — it's reaching a cause that is actionable and whose elimination prevents recurrence.

> 5_WHYS.analysis — Hydraulic System Failure Example
W1
Problem: The hydraulic press stopped mid-cycle.
Why? → The hydraulic pump lost pressure.
W2
Why did the pump lose pressure?
Why? → The pump seal failed and fluid leaked out.
W3
Why did the pump seal fail?
Why? → The seal showed premature wear beyond its rated service life.
W4
Why did the seal wear prematurely?
Why? → Contaminated hydraulic fluid was circulating through the system.
W5
Why was the fluid contaminated?
ROOT CAUSE → No fluid filtration maintenance interval existed in the PM schedule. The filter had never been changed since installation.

The fix for W1 (replace the seal) costs an hour and $40. The fix for W5 (create a PM interval for filter replacement) costs a spreadsheet update and prevents the next three seal failures, two pump replacements, and one unplanned production shutdown.

// grabnade.com · apparel

Here for the Root Cause.
Not the Excuses.

For the engineer who asks one more Why
when everyone else has already moved on
to the next symptom.
[ SHOP THE TEE ]

## 03. OTHER RCA METHODS — WHEN 5 WHYS ISN'T ENOUGH

The 5 Whys works well for linear cause-and-effect chains. But many failures have multiple contributing causes that branch, interact, and compound. For those cases, more structured methods exist.

Method 01

Fishbone Diagram

Also called the Ishikawa diagram or cause-and-effect diagram. Maps all potential causes of a failure across six standard categories: Machine, Method, Material, Man, Measurement, and Environment. Forces the team to consider the full problem space before converging on a cause.

Best for: Complex failures with multiple possible causes, cross-functional teams, manufacturing defects.
Method 02

Fault Tree Analysis (FTA)

A top-down, deductive method using Boolean logic gates (AND/OR) to map the combinations of events that lead to a top-level failure. Quantifiable when failure probabilities are known. Standard in aerospace, nuclear, and safety-critical systems under IEC 61025.

Best for: Safety-critical systems, when multiple simultaneous failures are possible, regulatory compliance.
Method 03

FMEA

Failure Mode and Effects Analysis. A proactive method — applied before failure occurs — that systematically identifies potential failure modes, their effects, and their likelihood. Each mode gets a Risk Priority Number (RPN) based on severity, occurrence, and detectability.

Best for: New product development, design reviews, process validation, IATF 16949 compliance.
Method 04

8D Report

Eight Disciplines problem solving. A structured team-based process that moves from containment (stop the bleeding) through root cause identification to permanent corrective action and prevention. The automotive industry standard for supplier corrective action requests.

Best for: Customer complaints, supplier corrective actions, recurring field failures, automotive and aerospace supply chains.

## 04. CHOOSING THE RIGHT METHOD

Method Best Situation Time Required
5 Whys Simple linear failures, quick investigations, shop floor use 30 min – 2 hrs
Fishbone Unknown cause, multiple possibilities, team brainstorming 1 – 4 hrs
FTA Safety systems, complex multi-cause failures, quantified risk Days – weeks
FMEA Proactive design and process risk, before failure occurs Days – weeks
8D Report Customer-facing failures, corrective action documentation Days – months

## 05. THE MISTAKES THAT KILL A GOOD RCA

Root cause analysis fails more often from process errors than from technical complexity. The most common ways a good investigation goes wrong:

Stopping at the symptom. The machine failed because the operator didn't follow the procedure. That's an observation, not a root cause. Why didn't the operator follow the procedure? Was the procedure unclear? Untrained? Unrealistic under production pressure? The answer is deeper than the first observation.

Assigning blame instead of finding cause. Human error is almost never a root cause — it's a symptom of a system that allows or enables the error. An RCA that concludes "operator error" and stops there has failed. The question is always: what systemic condition made the error possible or likely?

Fixing the finding, not the cause. Retraining the operator after an incident is a containment action, not a corrective action. If the system still allows the same error, the retraining will eventually wear off and the failure will recur — usually with a different operator.

> THE TEST OF A REAL ROOT CAUSE: Ask this: "If we eliminate this cause, does this class of failure become impossible — or just less likely?" If the answer is "impossible," you have the root cause. If the answer is "less likely," keep asking why.

## 06. THE TAKEAWAY

Root cause analysis is not a paperwork exercise. It's the discipline that separates engineers who solve problems from engineers who manage them indefinitely. The symptom is always easier to fix. The excuse is always available. The root cause takes work.

The engineers who find it are the ones who stay one question longer than everyone else is willing to stay. They're not here for the quick fix. They're here for the root cause.

Back to blog

Leave a comment

Please note, comments need to be approved before they are published.