Skip to main content

8.3 Root Cause Analysis (RCA) & CAPA

If you are fighting the same fire this week that you fought last month, your CAPA system is broken. Root Cause Analysis (RCA) is the difference between "fixing" a problem and "solving" it. Most organizations stop at the symptom ("Operator error"); true engineering discipline drills down to the systemic failure ("Why did the system allow the error?"). This chapter defines the logic required to permanently kill a defect.

The RCA Mindset: Blame vs. Physics

The first rule of RCA is: You cannot fire your way to quality. If you blame the operator ("Man"), you have failed the investigation. Humans are variable; processes must be robust.

The Investigation Hierarchy:

  1. Physics First: Did the machine or material fail?
  2. Process Second: Did the method allow variation?
  3. People Last: Did the operator willfully violate a clear, physically possible instruction?

The 5 Whys (Drilling for Oil)

Do not stop at the first "Why." The first answer is usually a symptom. The fifth answer is the root cause.

Example: Solder Short on U12

  • Why 1: Solder bridged two pins. (Symptom)
  • Why 2: Too much solder paste deposited. (Direct Cause)
  • Why 3: The stencil aperture was too large. (Process Cause)
  • Why 4: The stencil design followed the pad 1:1 without reduction. (Design Cause)
  • Why 5: The DFM Guideline for 0.4mm pitch components was outdated. (Systemic Root Cause)

Action: Update the DFM Guideline. Just cleaning the board (Correction) ensures the problem will happen again tomorrow.

The Fishbone (Ishikawa) Logic

Use the "6Ms" to structure your brainstorming. If you don't look at all 6, you will miss the interaction variables.

  • Man: Training, fatigue, visual acuity.
  • Machine: Calibration, wear, settings, maintenance.
  • Material: Vendor changes, shelf life, moisture content.
  • Method: SOP clarity, sequence, tooling.
  • Measurement: Gauge R&R, lighting, parallax error.
  • Mother Nature: Humidity (ESD/MSD), temperature drift.

Pro-Tip: If your Root Cause is "Operator Training," I demand to see the "Method" analysis. Training cannot fix a process that requires superhuman attention span.

CAPA: Correction vs. Corrective Action

Do not confuse these terms. They are legally distinct in an audit.

Correction (The Band-Aid):

  • Definition: Immediate action to fix the non-conformance.
  • Example: Reworking the solder bridge.
  • Result: The product is good, but the risk remains.

Corrective Action (The Cure):

  • Definition: Action to eliminate the cause of the non-conformance.
  • Example: Redesigning the stencil aperture.
  • Result: The defect can never physically happen again.

Preventive Action (The Vaccine):

  • Definition: Action to eliminate a potential cause in other products.
  • Example: Applying the new aperture design rule to all future PCB layouts.

Final Checklist

Control Point

Critical Requirement

Risk Avoided

Depth

Drill down to System/Process level, not "Human Error."

Recurrent Failures

Evidence

Root cause must be able to turn the problem On and Off (simulation).

Guesswork

Action

CAPA must include process change, not just "Retrain Operator."

Ineffective "Fixes"

Verification

Audit effectiveness 30-60 days after close.

Zombie Defects

Scope

Apply "Preventive Action" to similar product families.

Cross-Product Contamination