6.1 Maintenance governance: KPIs, roles & escalation
Total Productive Maintenance (TPM) is more than a cleaning schedule; it is a systematic approach to maximizing Asset Utilization. In high-volume electronics manufacturing, unplanned machine downtime directly wastes capital. The operational model must shift from “Repair when Broken” to “Monitor to Prevent.” The goal is to stabilize process capability (Cₚₖ) continuously, ensuring manufacturing yield remains a predictable and consistent metric.
Overall Equipment Effectiveness (OEE)
Section titled “Overall Equipment Effectiveness (OEE)”OEE is the primary metric for understanding factory floor performance. It reveals inefficiencies like slow cycles, minor jams, and micro-stops that operators might overlook. For critical SMT assets, the facility target should be > 85%.
OEE = Availability × Performance × Quality
1. Availability (a)
Section titled “1. Availability (a)”- Definition: The ratio of actual Run Time to Planned Production Time.
- Changeover Limit: When a changeover (SMED) exceeds 15 minutes, it requires an immediate engineering review. Teams should be trained to pre-stage all necessary feeders, carts, and stencils offline while the machine is still running the previous batch.
- Material Shortages: If a material shortage stops the line, it must be logged accurately as a “Logistics Loss,” not a “Maintenance Loss.” Correct categorization is essential for identifying and solving the underlying root cause.
2. Performance (p)
Section titled “2. Performance (p)”- Definition: The net speed achieved compared to the machine’s Designed Cycle Time (its nameplate capacity).
- Speed Derating: If a machine is intentionally run at less than 95% of its rated speed, formal engineering justification is required. Running a chip shooter at a reduced speed to “save the nozzles” often masks the real issue, such as vacuum leaks or filter contamination. The solution is to fix the vacuum system and restore the machine to its rated speed.
3. Quality (q)
Section titled “3. Quality (q)”- Definition: The First Pass Yield (FPY) of good, salable units.
- Yield Drops: Should the FPY at the AOI station fall below 98.5%, the line must be stopped immediately. While producing defective boards faster might temporarily improve the Performance component of OEE, it destroys the Quality component. The result is zero usable product and accelerated scrap generation.
The pillars of TPM
Section titled “The pillars of TPM”Maintenance is a tiered, shared responsibility model. It is not solely the domain of a siloed department called only after a breakdown occurs.
Pillar 1: Autonomous Maintenance (AM)
Section titled “Pillar 1: Autonomous Maintenance (AM)”- Primary Owner: The Machine Operator.
- Logic: The operator, who is closest to the machine, is best positioned to detect early signs of deviation before they lead to a major failure.
- Shift Start Requirement: Clean all optical sensors and transport rails.
- Weekly Requirement: Visually inspect linear guides and verify bearing grease levels.
- Tagging Protocol: Apply a physical abnormality tag to any issue—such as an audible air leak, abnormal bearing noise, or loose component—for prompt review by a technician.
Pillar 2: Planned Maintenance (PM)
Section titled “Pillar 2: Planned Maintenance (PM)”- Primary Owner: The Skilled Technician.
- Logic: Systematically restore physical assets to optimal condition based on actual usage intensity, not simply the calendar.
- Trigger Mechanics: Schedule PMs based on Run-Hours or Cycle Counts (e.g., every 1,000 operational hours), not just months. A machine running 24/7 will wear out much faster than one running a single shift.
- Parts Replacement: Proactively replace consumable items like pneumatic filters and mechanical belts based on their MTBF (Mean Time Between Failure) specifications, ideally before they fail and cause production downtime.
Pillar 3: Focused Improvement (Kobetsu Kaizen)
Section titled “Pillar 3: Focused Improvement (Kobetsu Kaizen)”- Primary Owner: A Cross-Functional Team (Process Engineering and Maintenance).
- Logic: Systematically eliminate chronic, recurring losses.
- RCA Trigger: Any unplanned downtime event lasting more than 60 minutes should trigger a mandatory Root Cause Analysis (RCA).
- Corrective Output: The outcome should be a permanent hardware modification or a software interlock (Poka-Yoke) to prevent recurrence. Simply “retraining the operator” is rarely an effective, long-term solution.
Downtime escalation matrix
Section titled “Downtime escalation matrix”Escalation is about rapidly mobilizing resources to reduce the Mean Time To Recovery (MTTR). While the technician handles the physical repair, management is responsible for removing organizational barriers.
- Level 1: Tactical Support (15 Minutes)
- Trigger: Machine down for more than 15 minutes.
- Owner: Notify the Maintenance Lead.
- Action: The Lead assesses whether the floor technician needs immediate additional help, advanced diagnostic tools, or schematic support.
- Level 2: Resource Allocation (60 Minutes)
- Trigger: Machine down for more than 60 minutes.
- Owner: Notify the Operations Manager.
- Action: The Manager authorizes expedited shipping for spare parts, approves emergency overtime, or makes the critical decision to reroute production to alternative lines.
- Level 3: Strategic Response (4 Hours)
- Trigger: Machine down for more than 4 hours.
- Owner: Notify the Plant Director.
- Action: Activate the Business Continuity Plan (BCP). The Director assumes responsibility for communicating with critical clients regarding schedule and delivery impacts.
Digital tracking
Section titled “Digital tracking”Manual logbooks are ineffective for data analysis. If machine state information is not recorded directly into the structured database of the Manufacturing Execution System (MES), it cannot be properly tracked or analyzed.
- Connectivity: All critical SMT and Reflow assets must push live state codes, speeds, and error logs directly to the central server.
- Visualization: Display composite Overall Equipment Effectiveness (OEE) scores in real-time on line-side Andon boards for high visibility.
Recap: Maintenance Governance KPIs, Roles & Escalation
Section titled “Recap: Maintenance Governance KPIs, Roles & Escalation”| Parameter | Target / Limit | Required Action | Escalation Trigger & Owner |
|---|---|---|---|
| Overall Equipment Effectiveness (OEE) | > 85% | Monitor via MES; display real-time on Andon boards. | — |
| Availability (Changeover) | ≤ 15 minutes | If exceeded, initiate immediate engineering review. | >15 min downtime: Notify Maintenance Lead. |
| Performance (Speed) | ≥ 95% of rated speed | If derated <95%, formal engineering justification required. | — |
| Quality (First Pass Yield) | ≥ 98.5% at AOI | If FPY <98.5%, stop line immediately. | — |
| Unplanned Downtime | — | Mandatory Root Cause Analysis (RCA) for events >60 min. | >60 min: Notify Operations Manager. >4 hours: Notify Plant Director. |