Skip to content

6.1 Maintenance governance: KPIs, roles & escalation

Total Productive Maintenance (TPM) is not a glorified cleaning schedule; it is the strict discipline of Asset Utilization. In high-volume electronics, a machine sitting idle due to unplanned downtime is simply wasting capital. The operational model must be deliberately shifted from “Repair when Broken” to “Monitor to Prevent.” The goal is not merely to fix machines, but to continuously stabilize the process capability (Cₚₖ) so that manufacturing yield remains a constant, highly predictable metric.

OEE is the primary metric of absolute truth on the factory floor. It exposes the “hidden factory” of slow cycles, minor jams, and micro-stops that operators often ignore. Our facility target for critical SMT assets should always be > 85%.

OEE = Availability × Performance × Quality

  • Definition: The ratio of actual Run Time compared to Planned Production Time.
  • Changeover Limit: If a changeover (SMED) takes > 15 minutes, immediate engineering review is required. The team must be trained to pre-stage all necessary feeders, carts, and stencils off-line while the machine is actively running the previous batch.
  • Material Shortages: If a material shortage unexpectedly stops the line, it must be accurately logged as a “Logistics Loss,” not a “Maintenance Loss.” Faulty categorization actively prevents the real root problem from being solved.
  • Definition: Net Speed achieved versus the Designed Cycle Time (the machine’s nameplate capacity).
  • Speed Derating: If a machine is purposefully throttled to < 95% of its rated speed, formal engineering justification is required. Casually running a chip shooter at 80% to arbitrarily “save the nozzles” only masks the root cause of the nozzle failure (which is often vacuum leaks or filter contamination). The vacuum system must be fixed and the rated speed restored.
  • Definition: First Pass Yield (FPY) of good, salable units.
  • Yield Drops: If FPY at the AOI station drops < 98.5%, the line must be immediately stopped. Producing bad boards faster technically improves OEE Performance, but it completely destroys OEE Quality. The net output of usable product is zero, and expensive scrap is simply accelerated.

Pro-Tip: Generic ‘Idle’ should not be accepted as a status code. The MES must be configured to force the operator to select a specific, actionable root cause code (e.g. “Waiting for Parts”, “Nozzle Jam”) before the machine is allowed to restart.

Maintenance is a tiered, shared responsibility model, not just a siloed department frantically called when things eventually break.

  • Primary Owner: The Machine Operator.
  • Logic: The person standing closest to the machine must detect the initial baseline drift long before it causes a hard failure.
  • Shift Start Mandate: All optical sensors and transport rails must be cleaned.
  • Weekly Mandate: Linear guides must be visually inspected and bearing grease levels visually verified.
  • Tagging Protocol: A physical abnormality tag must be applied to any audible air leak, abnormal bearing noise, or loose component for prompt technician review.
  • Primary Owner: The Skilled Technician.
  • Logic: Systematically restore physical assets to optimal operating condition based on active usage intensity, not calendar days.
  • Trigger Mechanics: PMs must be triggered by Run-Hours or Cycle Counts (e.g. 1,000 operational hours), never simply by “Months.” A machine running 24/7 wears out three times faster than one running a single shift.
  • Parts Replacement: Consumable pneumatic filters and mechanical belts must be preemptively replaced based on their MTBF (Mean Time Between Failure) specifications, ideally before they snap and halt the line in production.

Pillar 3: focused improvement (kobetsu kaizen)

Section titled “Pillar 3: focused improvement (kobetsu kaizen)”
  • Primary Owner: A Cross-Functional Team (Process Engineering + Maintenance).
  • Logic: Aggressively eradicate chronic, repeating losses.
  • RCA Trigger: Any incident of Unplanned Downtime > 60 minutes should trigger a mandatory Root Cause Analysis (RCA).
  • Corrective Output: The required output is a permanent hardware change or a software interlock (Poka-Yoke) to prevent recurrence. “Retraining the operator” is rarely a valid, long-term corrective action.

Escalation is primarily about rapidly unlocking resources to shorten the Mean Time To Recovery (MTTR). The local technician owns the physical repair, but management owns the removal of organizational barriers.

  • Level 1: Tactical Support (15 Minutes)
    • Trigger: Machine Down > 15 Minutes.
    • Owner: Notify the Maintenance Lead.
    • Action: The Lead assesses if the floor technician immediately needs additional assistance, advanced diagnostic tools, or schematic support.
  • Level 2: Resource Allocation (60 Minutes)
    • Trigger: Machine Down > 60 Minutes.
    • Owner: Notify the Operations Manager.
    • Action: The Manager formally authorizes expedited shipping for spare parts, approves emergency overtime, or triggers the critical decision to completely re-route production to alternative lines.
  • Level 3: Strategic Response (4 Hours)
    • Trigger: Machine Down > 4 Hours.
    • Owner: Notify the Plant Director.
    • Action: Activate the Business Continuity Plan (BCP). The Director assumes direct responsibility for critical client communication regarding schedule and delivery impacts.

Manual logbooks are data graveyards. If the machine state is not recorded directly into the precise Manufacturing Execution System (MES) structured database, it cannot be tracked and analyzed.

  • Connectivity: All critical SMT and Reflow assets must digitally push live state codes, speeds, and error logs directly to the central server.
  • Visualization: Overall Equipment Effectiveness (OEE) composite scores should be highly visible on line-side Andon boards in real-time.

Final Checkout: Maintenance governance: KPIs, roles & escalation

Section titled “Final Checkout: Maintenance governance: KPIs, roles & escalation”
ParameterMetric / RuleCritical State
OEE TargetComposite Score> 85%
Changeover (SMED)Duration< 15 Minutes
PerformanceSpeed DeratingProhibited (< 95%)
QualityFPY Target> 98.5%
Escalation L1Notify Lead> 15 Minutes
Escalation L2Notify Manager> 60 Minutes
RCA TriggerDowntime Duration> 60 Minutes
Data LoggingMethodAutomated MES (No Paper)