3.3 Failure mode and effects analysis
The Failure Mode and Effects Analysis (FMEA) is a highly effective, predictive engineering tool. When used correctly, it guides the systematic identification of what could go wrong and structurally mitigates those risks well before the first physical prototype is even built. If the FMEA is treated simply as a paperwork exercise designed to satisfy an auditor, its true value is missed—and those same failure modes will likely be discovered later in the field, where they are significantly more stressful and expensive to resolve.
The mechanics of risk (the RPN engine)
Section titled “The mechanics of risk (the RPN engine)”Traditionally, risk is quantified using the Risk Priority Number (RPN) to help focus attention on objective data rather than relying purely on instinct or opinion.
Basic Formula: RPN = Severity (S) x Occurrence (O) x Detection (D)
Understanding the Variables:
- Severity (S): This measures the ultimate impact of the failure on the end user.
- Important Context: Severity cannot be “inspected” away. It can only be reduced by fundamentally changing the Design itself (for example, by adding a physical safety fuse).
- 10 = A Hazard or Safety Risk that occurs without any warning.
- 1 = A minor defect with virtually no discernible effect on the user.
- Occurrence (O): This represents the probability of the specific root cause actually happening.
- How to Improve: This score is reduced by improving Process Capability (Cₚₖ) and establishing more robust design margins.
- Detection (D): This is the probability that the established control system will catch the failure before the product escapes the facility.
- How to Improve: This score is improved by implementing automated testing and mistake-proofing.
- 10 = Absolute uncertainty (relying on luck; no check in place).
- 1 = Proven Error Proofing (the physics or geometry naturally prevents the defect).
Execution strategy: DFMEA vs. PFMEA
Section titled “Execution strategy: DFMEA vs. PFMEA”It is very helpful to keep the specific intent of these two documents separate. One protects the theoretical design, while the other protects the physical build process.
DFMEA (Design Failure Mode and Effects Analysis)
Section titled “DFMEA (Design Failure Mode and Effects Analysis)”- Focus: The fundamental physics of the components, the circuit topology, physical geometry, and material properties.
- Example Failure: “The capacitor derating is insufficient to handle the expected switching voltage spikes.”
- Mitigation: The engineering team selects a new component with a higher voltage rating.
PFMEA (Process Failure Mode and Effects Analysis)
Section titled “PFMEA (Process Failure Mode and Effects Analysis)”- Focus: The machine parameters, the operator’s interaction with the product, the factory environment, and the assembly method itself.
- Example Failure: “The operator inadvertently installs the polarized capacitor backwards on the board.”
- Mitigation: The team adds a clear polarity marking to the PCB silkscreen and implements an Automated Optical Inspection (AOI) check at that station.
Scoring calibration
Section titled “Scoring calibration”- When the Severity is 9 or 10 (indicating a Safety or Regulatory concern), mitigation action is necessary, regardless of how low the overall RPN might be.
- When the total RPN is > 100, it is best practice to develop a structured mitigation plan to bring that number down.
- When the Detection score is a 10 (often meaning sole reliance on visual inspection by a human), that control must be challenged. Visual inspection is statistically unreliable for critical features, so a more robust detection method must be implemented.
The AIAG-VDA harmonization (the modern standard)
Section titled “The AIAG-VDA harmonization (the modern standard)”Modern Quality Engineering is slowly transitioning away from the pure RPN calculation towards Action Priority (AP) levels (High, Medium, Low). The AP system helps prevent subjective score manipulation—such as artificially lowering the Detection score just to get the total RPN below an arbitrary threshold like 100.
Helpful Logic Flow:
- High Priority: For issues with a Severity of 9-10 accompanied by any meaningful Occurrence rate. Action: A mandatory review at the Management Level is suggested to align resources.
- Medium Priority: For issues with a Severity of 7-8 and a Moderate Occurrence rate. Action: A detailed review at the Engineering Level is suggested to design a proper fix.
Recap: Failure Mode and Effects Analysis Action Prioritization
Section titled “Recap: Failure Mode and Effects Analysis Action Prioritization”| Parameter | Threshold | Action Priority | Required Action |
|---|---|---|---|
| Severity (S) | 9 or 10 | High | Mandatory mitigation; management-level review. |
| RPN (S×O×D) | > 100 | Medium | Develop structured mitigation plan. |
| Detection (D) | 10 (e.g., visual inspection) | High | Replace with robust method (e.g., automation, error-proofing). |
| Action Priority (AP) | High (S=9-10 & any O) | High | Mandatory review; management allocates resources. |
| Action Priority (AP) | Medium (S=7-8 & moderate O) | Medium | Engineering-level review to design fix. |