6.1 Maintenance Governance: KPIs, Roles, Escalation
Total Productive Maintenance (TPM) is not a cleaning schedule; it is the fiscal discipline of Asset Utilization. In high-volume electronics, a machine sitting idle due to "unplanned downtime" is actively burning capital. We shift the operational model from "Repair when Broken" to "Monitor to Prevent." The goal is not merely to fix machines, but to stabilize process capability (Cpk) so that yield is a constant, not a variable.
Overall Equipment Effectiveness (OEE)
OEE is the non-negotiable metric of truth. It exposes the "hidden factory" of slow cycles and micro-stops. The facility target for critical SMT assets is > 85%.
OEE = Availability × Performance × Quality
1. Availability (A)
- Definition: The ratio of Run Time to Planned Production Time.
- If Changeover (SMED) takes > 15 minutes -> Then the process is Failed.
- Constraint: Pre-stage feeders and stencils off-line while the machine is running the previous batch.
- If Material Shortage stops the line -> Then Log as "Logistics Loss," not "Maintenance Loss."
2. Performance (P)
- Definition: Net Speed vs. Designed Cycle Time.
- If Machine is throttled < 95% of rated speed -> Then Engineering Justification is required.
- Physics: Running a chip shooter at 80% to "save the nozzles" masks the root cause of the nozzle failure (likely vacuum leaks or contamination). Fix the vacuum, restore the speed.
3. Quality (Q)
- Definition: First Pass Yield (FPY) of good units.
- If FPY at AOI drops < 98.5% -> Then Stop the line.
- Reason: Producing bad boards faster improves OEE Performance but destroys OEE Quality. Net output is zero.
Pro-Tip: Do not accept "Idle" as a status code. Configure the MES to force the operator to select a specific reason (e.g., "Waiting for Parts," "Nozzle Jam") before the machine can restart. "Idle" is data fog.
The Pillars of TPM
Maintenance is a tiered responsibility structure, not a siloed department.
Pillar 1: Autonomous Maintenance (AM)
- Role: The Machine Operator.
- Logic: The person closest to the machine must detect the drift before it becomes a failure.
- Mandate:
- Start of Shift: Clean sensors and transport rails.
- Weekly: Inspect linear guides and verify grease levels.
- Tagging: Apply a physical tag to any leak, noise, or loose bolt for technician review.
Pillar 2: Planned Maintenance (PM)
- Role: Skilled Technician.
- Logic: Restore assets to "Day 1" condition based on usage, not calendar.
- Mandate:
- Trigger: PMs are triggered by Run-Hours or Cycle Counts (e.g., 1,000 hrs), never by "Months." A machine running 24/7 wears out 3x faster than one running single shift.
- Parts: Replace filters and belts based on MTBF (Mean Time Between Failure), before they snap.
Pillar 3: Focused Improvement (Kobetsu Kaizen)
- Role: Cross-Functional Team (Process + Maintenance).
- Logic: Eradicate chronic losses.
- If Unplanned Downtime > 60 minutes -> Then Mandatory Root Cause Analysis (RCA).
- Output: A physical hardware change or software interlock (Poka-Yoke) to prevent recurrence. "Retraining operator" is not a valid corrective action.
Downtime Escalation Matrix
Escalation is not about "reporting trouble"; it is about unlocking resources to shorten Mean Time To Recovery (MTTR). The technician owns the repair, but management owns the barrier removal.
- Level 1: Tactical Support (15 Minutes)
- Trigger: Machine Down > 15 Minutes.
- Who: Notify Maintenance Lead.
- Action: Lead assesses if the technician needs additional hands or diagnostic tools.
- Level 2: Resource Allocation (60 Minutes)
- Trigger: Machine Down > 60 Minutes.
- Who: Notify Operations Manager.
- Action: Manager authorizes expedited shipping for spare parts, approves overtime, or triggers the decision to re-route production.
- Level 3: Strategic Response (4 Hours)
- Trigger: Machine Down > 4 Hours.
- Who: Notify Plant Director.
- Action: Activate Business Continuity Plan (BCP). The Director assumes responsibility for client communication regarding schedule slippage.
Digital Tracking
Manual logbooks are data graveyards. If the machine state is not in the MES, it didn't happen.
- Connectivity: All SMT and Reflow assets must push live state codes to the server.
- Visualization: OEE must be displayed on line-side Andon boards in real-time. Hiding the score hides the problem.
Final Checklist
Parameter | Metric / Rule | Critical State |
OEE Target | Composite Score | > 85% |
Changeover (SMED) | Duration | < 15 Minutes |
Performance | Speed Derating | Prohibited (< 95%) |
Quality | FPY Target | > 98.5% |
Escalation L1 | Notify Lead | > 15 Minutes |
Escalation L2 | Notify Manager | > 60 Minutes |
RCA Trigger | Downtime Duration | > 60 Minutes |
Data Logging | Method | Auto-MES (No Paper) |