4.4 Escalation SLAs
Hope is not a strategy. When a line stops, the clock starts ticking against the factory's P&L. An "Escalation SLA" (Service Level Agreement) is a programmed set of rules that governs the human response to downtime. It removes emotion and ambiguity: if the line is down, the system summons help automatically.
The SLA Matrix by Event Type
Define distinct workflows for different failure modes. A specialized machine error requires a different responder than a cardboard shortage.
Event Class | Trigger Condition | Primary Responder (Tier 1) | Response SLA (Max Time) |
Machine Down | Machine State = Error > 5 Mins OR Operator "Maint Call" Button. | Maintenance Technician | 10 Minutes |
Quality Stop | Consecutive Yield < 90% OR Critical Test Fail (e.g., Hipot). | Process / Quality Engineer | 15 Minutes |
Material Starved | "Feeder Low" Warning OR Operator "Material Call" Button. | Water Spider / Logistics | 5 Minutes |
Traceability Gap | System Interlock: "Genealogy Link Missing" or "Profile Mismatch". | MES Super User / Quality Admin | 10 Minutes |
IT/Network | Server Ping Fail OR HMI Freeze. | IT Support (L1) | 5 Minutes |
Logic:
- If Event occurs → Then Timer starts (T=0).
- If Responder scans badge at station < SLA → Then Timer Pauses (State = "In Progress").
- If Timer > SLA → Then Trigger Escalation (Tier 2).
The Escalation Hierarchy (Automatic Promotion)
The system does not care about rank or feelings. If the problem is not solved, the notification moves up the chain of command.
Tier 1: The Tactical Response (T + 0 min)
- Who: Line Technician, Line Lead, Water Spider.
- Notification: Andon Board (Yellow/Red), Smart Watch/Pager.
- Goal: Quick fix / Reset.
Tier 2: The Engineering Response (T + 15 min)
- Who: Process Engineer, Maintenance Supervisor, Quality Manager.
- Trigger: Tier 1 failed to fix (or failed to acknowledge) within 15 minutes.
- Notification: SMS / Mobile Push Notification.
- Goal: Root cause analysis, advanced troubleshooting.
Tier 3: The Executive Response (T + 60 min)
- Who: Plant Manager, Director of Operations.
- Trigger: Line down > 1 Hour.
- Notification: Email / Urgent SMS.
- Goal: Resource reallocation, overtime authorization, customer impact assessment.
Traceability Gap Protocol (Special Handling)
A Traceability Gap (e.g., "Parent unit passed, but Child component has no scan record") is not a machine fault; it is a Compliance Breach.
- Severity: Critical.
- Action: Immediate Hard Stop of the line section.
- Responder: Must be a System Admin or Quality Manager. Operators cannot override genealogy errors.
- Resolution: Manual data patch (if physical proof exists) or Scrap Unit.
Closure Rules: The "Confession"
Closing a ticket is a data entry event. The system must force the responder to categorize the failure before the line allows a restart.
Mandatory Fields
- Root Cause Code: Select from standardized tree (e.g., M_Motor_Fail, Q_Solder_Bridge). "Other" is banned.
- Action Taken: Brief text description (e.g., "Replaced sensor X").
- Duration: Auto-calculated (Time_Closed - Time_Opened).
The "Micro-Stop" Filter
- Scenario: Machine errors, operator resets immediately (Duration < 2 mins).
- Logic: Do not demand a manual entry. Auto-log as System_Microstop.
- Review: If Count(Microstops) > 10 per hour → Trigger Tier 2 Alert (Chronic Issue).
Verification Scan
- Rule: A Maintenance ticket is not closed until the machine successfully produces 1 Good Unit.
- Logic:
- Tech fixes machine.
- Tech updates status to "Verify".
- Operator runs unit.
- If Result = Pass → Then Ticket Auto-Closes.
Final Checklist
Category | Metric / Control | Threshold / Rule |
Response | Time-to-Ack | Tier 1 must acknowledge (Scan Badge) within defined SLA (5-15 mins). |
Escalation | Auto-Trigger | If T > 15m (unresolved) → Then Auto-SMS to Tier 2. |
Data | Taxonomy | Closure requires specific Root Cause Code (No "Misc"). |
Integrity | Verification | Maintenance tickets require 1 "Pass" cycle to close. |
Compliance | Traceability | Genealogy gaps require Manager-level override to clear. |
Visibility | Andon | Andon Board reflects Escalation Level (e.g., Flashing Speed increases). |
Analysis | Pareto | Weekly review of top offenders based on Sum(Duration) by Reason Code. |