Skip to content

8.5 RMA processing & field failure analysis

Return Merchandise Authorization (RMA) field returns are a crucial source of manufacturing truth. While internal factory yield metrics measure the ability to assemble a product under controlled conditions, RMA data measures the product’s actual reliability in the real world. A robust Field Failure Analysis (FFA) is a scientific engineering discipline. The objective is to identify the physical failure mode, replicate it in the lab, and feed the Corrective Action back into the Design (NPI) or Manufacturing process to prevent recurrence.

Returned units are unknown variables. They may have been installed in hazardous environments (e.g. medical operating rooms, industrial facilities) or subjected to contamination. It is essential to protect engineering staff during intake.

The Decision Logic for Intake:

Whenever a unit arrives from a Medical, Chemical, or Industrial deployment, the box must be quarantined immediately. It should not be opened without a signed Decontamination Certificate from the customer to prevent exposure to pathogens or chemicals. Whenever a Lithium-Ion Battery is visibly swollen, punctured, or thermally damaged, the unit must instantly be classified as a Hazardous Material (HAZMAT). It must be stored in a rated fireproof cabinet or sand enclosure. Attempting to charge or test it is prohibited. The serial number must always be documented and the total external condition of the unit photographed from all sides before any technician begins testing. This forensic evidence is necessary to differentiate legitimate shipping damage from signs of customer misuse or drop impact.

The verification gate (confirming the failure)

Section titled “The verification gate (confirming the failure)”

An expensive and frequently unhelpful outcome in any RMA system is NTF (No Trouble Found). NTF implies the customer incorrectly reported the issue, but it often actually points to a gap in factory test coverage.

The Forensic Testing Hierarchy:

  1. Microscopic Visual Inspection: Signs of Electrical Overstress (EOS, such as burn marks), liquid ingress (corrosion tracks), or structural impact damage (crushed housings) must be hunted for.
    • The Decision: If macroscopic physical damage is evident, electrical testing must be halted. The root cause is likely “Customer Misuse” or “Transit Shock.” Prolonged debugging of structurally compromised boards is unproductive.
  2. Functional Verification: The failure must be systematically attempted to be replicated using the exact scenario reported by the customer, rather than merely running the standard factory test script.
  3. The NTF Protocol: If the unit passes all standard factory tests:
    • The Action: The unit must be subjected to realistic environmental stressors, such as thermal cycling (-20˚C to +70˚C) or vibration testing. Intermittent hardware failures (like fractured micro-BGA solder joints) often remain hidden at room temperature.

Pro-Tip: Customers may report a “Dead Unit” when the core issue is a deeply specific sleep-mode firmware hang. Simply plugging it in, seeing an LED illuminate, and sending it back as “Pass” must be avoided. The precise failure environment must actively be attempted to be replicated.

When a physical defect is confirmed, it must be categorized into distinct buckets to assign ownership clearly.

Bucket A: Electrical Overstress (EOS)

  • The Physical Signs: Burnt silicon components, vaporized PCB copper traces, or melted plastic housings.
  • The Physics: A severe external kinetic or electrical energy surge (e.g. an incorrect power supply, lightning strike, or direct short circuit).
  • The Owner: The Customer (Misapplication) or the Design Team (Inadequate input over-voltage protection).

Bucket B: Electrostatic Discharge (ESD)

  • The Physical Signs: A non-functional board with zero external burn marks. SEM (Scanning Electron Microscope) decapsulation analysis reveals microscopic gate oxide punctures inside the silicon die.
  • The Physics: A latent defect often caused by poor factory grounding during initial assembly or improper unshielded handling.
  • The Owner: SMT Manufacturing Process (Violation of IPC Electrostatic Protected Area standards).

Bucket C: Workmanship / Component Quality

  • The Physical Signs: Dry/cold solder joints, a missing passive component, a resistor placed with the wrong value, or internally defective silicon right from the reel.
  • The Owner: SMT Manufacturing Line or the Component Supplier.

RMA data should actively and automatically trigger the Corrective Action (CAPA) system.

Trigger Thresholds:

  • Should a Safety Incident occur (e.g. Lithium Fire, Electrical Shock, Thermal Runaway), a Global Stop Ship must be executed and a formal Recall Analysis initiated within hours.
  • Should a newly discovered Failure Mode be detected in the field, an immediate CAR (Corrective Action Request) must be issued to the Design Engineering team to deploy a fix.
  • Should the Repeat Failure Rate for a known issue exceed a specified threshold (e.g. > 1%), a Process Audit of the manufacturing line must be initiated. This indicates the previous fix was incomplete.

Final Checkout: RMA processing & field failure analysis

Section titled “Final Checkout: RMA processing & field failure analysis”
Control PointEngineering RequirementRisk Avoided
Intake SafetyVerify Decontamination Certs and PPE for medical/industrial units.Biohazard / Toxic Chemical Exposure.
Verification LogicReplicate using the Customer Environment, rather than the sterile Factory Test script.False NTF Results.
NTF RateEngineering target should remain < 10%. Higher rates suggest automated Test Specification gaps.Unexplained Field Risk.
Failure AnalysisDifferentiate EOS (External Customer Fault) vs ESD (Internal Factory Fault), using SEM if necessary.Misassigning deep Liability.
Scrap DispositionDestroy scrapped RMA units appropriately (e.g. crush the main BGA) to prevent reuse.Gray Market Resale and Warranty Fraud.
The Feedback LoopRMA Metrics must drive immediate updates to the Design FMEA or manufacturing SOPs.Endlessly Repeating Design Errors.