Skip to content

3.3 Failure mode and effects analysis

The Failure Mode and Effects Analysis (FMEA) is a highly effective, predictive engineering tool. When used correctly, it guides the systematic identification of what could go wrong and structurally mitigates those risks well before the first physical prototype is even built. If the FMEA is treated simply as a paperwork exercise designed to satisfy an auditor, its true value is missed—and those same failure modes will likely be discovered later in the field, where they are significantly more stressful and expensive to resolve.

Traditionally, risk is quantified using the Risk Priority Number (RPN) to help focus attention on objective data rather than relying purely on instinct or opinion.

Basic Formula: RPN = Severity (S) x Occurrence (O) x Detection (D)

Understanding the Variables:

  • Severity (S): This measures the ultimate impact of the failure on the end user.
    • Important Context: Severity cannot be “inspected” away. It can only be reduced by fundamentally changing the Design itself (for example, by adding a physical safety fuse).
    • 10 = A Hazard or Safety Risk that occurs without any warning.
    • 1 = A minor defect with virtually no discernible effect on the user.
  • Occurrence (O): This represents the probability of the specific root cause actually happening.
    • How to Improve: This score is reduced by improving Process Capability (Cₚₖ) and establishing more robust design margins.
  • Detection (D): This is the probability that the established control system will catch the failure before the product escapes the facility.
    • How to Improve: This score is improved by implementing automated testing and mistake-proofing.
    • 10 = Absolute uncertainty (relying on luck; no check in place).
    • 1 = Proven Error Proofing (the physics or geometry naturally prevents the defect).

Pro-Tip: Simply averaging RPN scores across a design must be avoided. A single line item with a Severity of 10 and a total RPN of 90 is significantly more critical to address than an item with a Severity of 3 and an RPN of 200. Ultimately, safety and compliance must always take precedence over the raw statistics.

It is very helpful to keep the specific intent of these two documents separate. One protects the theoretical design, while the other protects the physical build process.

DFMEA (Design Failure Mode and Effects Analysis)

Section titled “DFMEA (Design Failure Mode and Effects Analysis)”
  • Focus: The fundamental physics of the components, the circuit topology, physical geometry, and material properties.
  • Example Failure: “The capacitor derating is insufficient to handle the expected switching voltage spikes.”
  • Mitigation: The engineering team selects a new component with a higher voltage rating.

PFMEA (Process Failure Mode and Effects Analysis)

Section titled “PFMEA (Process Failure Mode and Effects Analysis)”
  • Focus: The machine parameters, the operator’s interaction with the product, the factory environment, and the assembly method itself.
  • Example Failure: “The operator inadvertently installs the polarized capacitor backwards on the board.”
  • Mitigation: The team adds a clear polarity marking to the PCB silkscreen and implements an Automated Optical Inspection (AOI) check at that station.
  • When the Severity is 9 or 10 (indicating a Safety or Regulatory concern), mitigation action is necessary, regardless of how low the overall RPN might be.
  • When the total RPN is > 100, it is best practice to develop a structured mitigation plan to bring that number down.
  • When the Detection score is a 10 (often meaning sole reliance on visual inspection by a human), that control must be challenged. Visual inspection is statistically unreliable for critical features, so a more robust detection method must be implemented.

The AIAG-VDA harmonization (the modern standard)

Section titled “The AIAG-VDA harmonization (the modern standard)”

Modern Quality Engineering is slowly transitioning away from the pure RPN calculation towards Action Priority (AP) levels (High, Medium, Low). The AP system helps prevent subjective score manipulation—such as artificially lowering the Detection score just to get the total RPN below an arbitrary threshold like 100.

Helpful Logic Flow:

  • High Priority: For issues with a Severity of 9-10 accompanied by any meaningful Occurrence rate. Action: A mandatory review at the Management Level is suggested to align resources.
  • Medium Priority: For issues with a Severity of 7-8 and a Moderate Occurrence rate. Action: A detailed review at the Engineering Level is suggested to design a proper fix.

Final Checkout: Failure mode and effects analysis (FMEA)

Section titled “Final Checkout: Failure mode and effects analysis (FMEA)”
Control PointGuiding Principle
Severity ScoringA score of 9 or 10 typically indicates a Safety or Regulatory issue. A Severity score must never be lowered simply based on having “good testing.”
MitigationPrevention (Poka-Yoke) must always be preferred over Detection. It must be remembered that “Retrain Operator” is generally not a valid, long-term engineering fix.
Loop ClosureThe RPN must be re-scored after implementing mitigation. The new RPN must be lower, or the action was likely ineffective.
Living DocumentThe FMEA must be treated as a living document. It must be updated whenever a new RMA or major Non-Conformance is encountered in the field.
Scoring AnchorA standardized scoring table for S/O/D must always be used. Consistent baselines prevent teams from simply guessing the numbers.