Problem Solving - Quality - Engineering
Share
I'm Here for the Root Cause.
Not the Excuses.
Every engineering failure has two kinds of fixes. The first is fast: replace the part, restart the machine, patch the code. The problem disappears. Until it comes back — in two weeks, or six months, or in a different system entirely. The second fix is slower and harder: find out why it happened in the first place. That's Root Cause Analysis.
RCA isn't a single tool. It's a discipline — a structured refusal to accept the symptom as an answer. And the engineers who practice it are the ones who stop fixing the same problem twice.
## 01. SYMPTOM VS ROOT CAUSE — THE CORE DISTINCTION
A symptom is what you see. A root cause is what created it. The gap between the two is where most engineering time gets wasted.
The classic example: a machine stops on the production line. The symptom is the stoppage. The immediate cause might be a tripped breaker. Replace the breaker and the machine runs again — for now. But why did the breaker trip? Overload. Why was there an overload? A bearing running hot. Why was the bearing running hot? Insufficient lubrication. Why was lubrication insufficient? No preventive maintenance schedule existed for that component.
The root cause isn't the breaker. It's a maintenance gap. Fix only the breaker and the bearing fails completely next month, taking the machine down for a week instead of an hour.
## 02. THE 5 WHYS — THE METHOD EVERYONE KNOWS
Developed within the Toyota Production System and formalized by Taiichi Ohno, the 5 Whys is the most widely used RCA technique in manufacturing, software, and field service. The method is exactly what it sounds like: ask "why" repeatedly until you reach a cause you can actually fix.
Five iterations is a guideline, not a rule. Some problems resolve in three. Complex systemic failures may require seven or eight. The stopping criterion isn't the count — it's reaching a cause that is actionable and whose elimination prevents recurrence.
The fix for W1 (replace the seal) costs an hour and $40. The fix for W5 (create a PM interval for filter replacement) costs a spreadsheet update and prevents the next three seal failures, two pump replacements, and one unplanned production shutdown.
// grabnade.com · apparel
Here for the Root Cause.
Not the Excuses.
For the engineer who asks one more Whywhen everyone else has already moved on
to the next symptom.
## 03. OTHER RCA METHODS — WHEN 5 WHYS ISN'T ENOUGH
The 5 Whys works well for linear cause-and-effect chains. But many failures have multiple contributing causes that branch, interact, and compound. For those cases, more structured methods exist.
Fishbone Diagram
Also called the Ishikawa diagram or cause-and-effect diagram. Maps all potential causes of a failure across six standard categories: Machine, Method, Material, Man, Measurement, and Environment. Forces the team to consider the full problem space before converging on a cause.
Fault Tree Analysis (FTA)
A top-down, deductive method using Boolean logic gates (AND/OR) to map the combinations of events that lead to a top-level failure. Quantifiable when failure probabilities are known. Standard in aerospace, nuclear, and safety-critical systems under IEC 61025.
FMEA
Failure Mode and Effects Analysis. A proactive method — applied before failure occurs — that systematically identifies potential failure modes, their effects, and their likelihood. Each mode gets a Risk Priority Number (RPN) based on severity, occurrence, and detectability.
8D Report
Eight Disciplines problem solving. A structured team-based process that moves from containment (stop the bleeding) through root cause identification to permanent corrective action and prevention. The automotive industry standard for supplier corrective action requests.
## 04. CHOOSING THE RIGHT METHOD
| Method | Best Situation | Time Required |
|---|---|---|
| 5 Whys | Simple linear failures, quick investigations, shop floor use | 30 min – 2 hrs |
| Fishbone | Unknown cause, multiple possibilities, team brainstorming | 1 – 4 hrs |
| FTA | Safety systems, complex multi-cause failures, quantified risk | Days – weeks |
| FMEA | Proactive design and process risk, before failure occurs | Days – weeks |
| 8D Report | Customer-facing failures, corrective action documentation | Days – months |
## 05. THE MISTAKES THAT KILL A GOOD RCA
Root cause analysis fails more often from process errors than from technical complexity. The most common ways a good investigation goes wrong:
Stopping at the symptom. The machine failed because the operator didn't follow the procedure. That's an observation, not a root cause. Why didn't the operator follow the procedure? Was the procedure unclear? Untrained? Unrealistic under production pressure? The answer is deeper than the first observation.
Assigning blame instead of finding cause. Human error is almost never a root cause — it's a symptom of a system that allows or enables the error. An RCA that concludes "operator error" and stops there has failed. The question is always: what systemic condition made the error possible or likely?
Fixing the finding, not the cause. Retraining the operator after an incident is a containment action, not a corrective action. If the system still allows the same error, the retraining will eventually wear off and the failure will recur — usually with a different operator.
## 06. THE TAKEAWAY
Root cause analysis is not a paperwork exercise. It's the discipline that separates engineers who solve problems from engineers who manage them indefinitely. The symptom is always easier to fix. The excuse is always available. The root cause takes work.
The engineers who find it are the ones who stay one question longer than everyone else is willing to stay. They're not here for the quick fix. They're here for the root cause.