Skip to content

3.3 Failure mode and effects analysis (FMEA)

The Failure Mode and Effects Analysis (FMEA) is a highly effective, predictive engineering tool. When used correctly, it guides the team to systematically identify what could go wrong and structurally mitigate those risks well before the first physical prototype is even built. If we treat the FMEA simply as a paperwork exercise designed to satisfy an auditor, we miss its true value—and we will likely discover those same failure modes later in the field, where they are significantly more stressful and expensive to resolve.

Traditionally, we quantify risk using the Risk Priority Number (RPN) to help focus our attention on objective data rather than relying purely on instinct or opinion.

Basic Formula: RPN = Severity (S) x Occurrence (O) x Detection (D)

Understanding the Variables:

  • Severity (S): This measures the ultimate impact of the failure on the end user.
    • Important Context: You cannot “inspect” Severity away. It can only be reduced by fundamentally changing the Design itself (for example, by adding a physical safety fuse).
    • 10 = A Hazard or Safety Risk that occurs without any warning.
    • 1 = A minor defect with virtually no discernible effect on the user.
  • Occurrence (O): This represents the probability of the specific root cause actually happening.
    • How to Improve: This score is reduced by improving our Process Capability (Cₚₖ) and establishing more robust design margins.
  • Detection (D): This is the probability that our established control system will catch the failure before the product escapes the facility.
    • How to Improve: This score is improved by implementing automated testing and mistake-proofing.
    • 10 = Absolute uncertainty (relying on luck; no check in place).
    • 1 = Proven Error Proofing (the physics or geometry naturally prevents the defect).

Pro-Tip: Try to avoid simply averaging RPN scores across a design. A single line item with a Severity of 10 and a total RPN of 90 is significantly more critical to address than an item with a Severity of 3 and an RPN of 200. Ultimately, safety and compliance must always take precedence over the raw statistics.

It is very helpful to keep the specific intent of these two documents separate. One protects the theoretical design, while the other protects the physical build process.

  • Focus: The fundamental physics of the components, the circuit topology, physical geometry, and material properties.
  • Example Failure: “The capacitor derating is insufficient to handle the expected switching voltage spikes.”
  • Mitigation: The engineering team selects a new component with a higher voltage rating.
  • Focus: The machine parameters, the operator’s interaction with the product, the factory environment, and the assembly method itself.
  • Example Failure: “The operator inadvertently installs the polarized capacitor backwards on the board.”
  • Mitigation: The team adds a clear polarity marking to the PCB silkscreen and implements an Automated Optical Inspection (AOI) check at that station.
  • When the Severity is 9 or 10 (indicating a Safety or Regulatory concern), mitigation action is necessary, regardless of how low the overall RPN might be.
  • When the total RPN is > 100, it is best practice to develop a structured mitigation plan to bring that number down.
  • When the Detection score is a 10 (often meaning we are relying solely on visual inspection by a human), we should challenge that control. Visual inspection is statistically unreliable for critical features, so we should aim to implement a more robust detection method.

The AIAG-VDA harmonization (the modern standard)

Section titled “The AIAG-VDA harmonization (the modern standard)”

Modern Quality Engineering is slowly transitioning away from the pure RPN calculation towards Action Priority (AP) levels (High, Medium, Low). The AP system helps prevent subjective score manipulation—such as artificially lowering the Detection score just to get the total RPN below an arbitrary threshold like 100.

Helpful Logic Flow:

  • High Priority: For issues with a Severity of 9-10 accompanied by any meaningful Occurrence rate. Action: We suggest a mandatory review at the Management Level to align resources.
  • Medium Priority: For issues with a Severity of 7-8 and a Moderate Occurrence rate. Action: We suggest a detailed review at the Engineering Level to design a proper fix.

Final Checkout: Failure mode and effects analysis (FMEA)

Section titled “Final Checkout: Failure mode and effects analysis (FMEA)”
Control PointGuiding Principle
Severity ScoringA score of 9 or 10 typically indicates a Safety or Regulatory issue. Never lower a Severity score simply based on having “good testing.”
MitigationAlways prefer Prevention (Poka-Yoke) over Detection. Remember that “Retrain Operator” is generally not a valid, long-term engineering fix.
Loop ClosureBe sure to re-score the RPN after implementing your mitigation. The new RPN must be lower, or the action was likely ineffective.
Living DocumentTreat the FMEA as a living document. Update it whenever you encounter a new RMA or major Non-Conformance in the field.
Scoring AnchorAlways use a standardized scoring table for S/O/D. Consistent baselines prevent teams from simply guessing the numbers.