Skip to main content

4.4 Escalation SLAs

Hope is not a strategy. When a line stops, the clock starts ticking against the factory's P&L. An "Escalation SLA" (Service Level Agreement) is a programmed set of rules that governs the human response to downtime. It removes emotion and ambiguity: if the line is down, the system summons help automatically.

The SLA Matrix by Event Type

Define distinct workflows for different failure modes. A specialized machine error requires a different responder than a cardboard shortage.

Event Class

Trigger Condition

Primary Responder (Tier 1)

Response SLA (Max Time)

Machine Down

Machine State = Error > 5 Mins OR Operator "Maint Call" Button.

Maintenance Technician

10 Minutes

Quality Stop

Consecutive Yield < 90% OR Critical Test Fail (e.g., Hipot).

Process / Quality Engineer

15 Minutes

Material Starved

"Feeder Low" Warning OR Operator "Material Call" Button.

Water Spider / Logistics

5 Minutes

Traceability Gap

System Interlock: "Genealogy Link Missing" or "Profile Mismatch".

MES Super User / Quality Admin

10 Minutes

IT/Network

Server Ping Fail OR HMI Freeze.

IT Support (L1)

5 Minutes

Logic:

  • If Event occurs → Then Timer starts (T=0).
  • If Responder scans badge at station < SLA → Then Timer Pauses (State = "In Progress").
  • If Timer > SLA → Then Trigger Escalation (Tier 2).

The Escalation Hierarchy (Automatic Promotion)

The system does not care about rank or feelings. If the problem is not solved, the notification moves up the chain of command.

Tier 1: The Tactical Response (T + 0 min)

  • Who: Line Technician, Line Lead, Water Spider.
  • Notification: Andon Board (Yellow/Red), Smart Watch/Pager.
  • Goal: Quick fix / Reset.

Tier 2: The Engineering Response (T + 15 min)

  • Who: Process Engineer, Maintenance Supervisor, Quality Manager.
  • Trigger: Tier 1 failed to fix (or failed to acknowledge) within 15 minutes.
  • Notification: SMS / Mobile Push Notification.
  • Goal: Root cause analysis, advanced troubleshooting.

Tier 3: The Executive Response (T + 60 min)

  • Who: Plant Manager, Director of Operations.
  • Trigger: Line down > 1 Hour.
  • Notification: Email / Urgent SMS.
  • Goal: Resource reallocation, overtime authorization, customer impact assessment.

Traceability Gap Protocol (Special Handling)

A Traceability Gap (e.g., "Parent unit passed, but Child component has no scan record") is not a machine fault; it is a Compliance Breach.

  • Severity: Critical.
  • Action: Immediate Hard Stop of the line section.
  • Responder: Must be a System Admin or Quality Manager. Operators cannot override genealogy errors.
  • Resolution: Manual data patch (if physical proof exists) or Scrap Unit.

Closure Rules: The "Confession"

Closing a ticket is a data entry event. The system must force the responder to categorize the failure before the line allows a restart.

Mandatory Fields

  • Root Cause Code: Select from standardized tree (e.g., M_Motor_Fail, Q_Solder_Bridge). "Other" is banned.
  • Action Taken: Brief text description (e.g., "Replaced sensor X").
  • Duration: Auto-calculated (Time_Closed - Time_Opened).

The "Micro-Stop" Filter

  • Scenario: Machine errors, operator resets immediately (Duration < 2 mins).
  • Logic: Do not demand a manual entry. Auto-log as System_Microstop.
  • Review: If Count(Microstops) > 10 per hour → Trigger Tier 2 Alert (Chronic Issue).

Verification Scan

  • Rule: A Maintenance ticket is not closed until the machine successfully produces 1 Good Unit.
  • Logic:
    1. Tech fixes machine.
    2. Tech updates status to "Verify".
    3. Operator runs unit.
    4. If Result = Pass → Then Ticket Auto-Closes.

Final Checklist

Category

Metric / Control

Threshold / Rule

Response

Time-to-Ack

Tier 1 must acknowledge (Scan Badge) within defined SLA (5-15 mins).

Escalation

Auto-Trigger

If T > 15m (unresolved) → Then Auto-SMS to Tier 2.

Data

Taxonomy

Closure requires specific Root Cause Code (No "Misc").

Integrity

Verification

Maintenance tickets require 1 "Pass" cycle to close.

Compliance

Traceability

Genealogy gaps require Manager-level override to clear.

Visibility

Andon

Andon Board reflects Escalation Level (e.g., Flashing Speed increases).

Analysis

Pareto

Weekly review of top offenders based on Sum(Duration) by Reason Code.