Skip to main content

5.3 Support Model (L1/L2/L3), Incident Response, Monitoring

A deployed system without a defined support architecture is a dormant failure waiting for a trigger. Operational stability requires a structured filtration mechanism that resolves routine friction at the lowest level while preserving high-level engineering capacity for architectural triage. Treat support not as a helpdesk, but as a continuity engine.

The Tiered Support Structure

Implement a rigid tiered system to prevent engineering fatigue. The objective is to filter "Noise" (User Error/Config) from "Signal" (Code Defects).

Tier 1: The Frontline (Helpdesk / Super Users)

  • Scope: Hardware connectivity (scanners, printers), user access reset, basic "How-to" questions.
  • Resolution Target: 60–70% of all incoming tickets.
  • Logic:
    • If issue is "User cannot log in" → Reset Password / Check AD Group.
    • If scanner is unresponsive → Replace Battery / Check Wi-Fi Profile.
    • If issue is unknown → Gather Screenshots & Logs -> Escalate to L2.

Tier 2: Application Support (System Analysts)

  • Scope: Data inconsistencies, master data configuration, SQL data patches, workflow logic validation.
  • Resolution Target: 20–25% of tickets.
  • Logic:
    • If "Order not visible on line" → Verify ERP-MES Interface logs.
    • If data requires correction → Apply Standard Operating Procedure (SOP) fix.

Tier 3: Engineering (Developers / Architects)

  • Scope: Code bugs, performance bottlenecks, architectural failure, security patches.
  • Entry Gate: L3 accepts tickets only with reproduction steps and log extracts provided by L2.

Pro-Tip: Empower L1 with a "Known Error Database" (KEDB). If a fix is documented, it belongs in L1, regardless of technical complexity. This shifts the load left.

Service Level Agreements (SLA)

Define SLAs based on business impact, not user urgency. A "Line Down" event supersedes all other engineering tasks.

Severity Definitions

  • Sev 1 (Critical): Production Halt. No workaround available. Financial loss is immediate.
    • Response: 15 min. Update Freq: Every 1 hr.
  • Sev 2 (High): Production degraded or workaround is painful. Performance issues.
    • Response: 2 hours. Update Freq: Every 4 hrs.
  • Sev 3 (Standard): Single user error, cosmetic bug, or non-blocking feature failure.
    • Response: 8 hours (Next Business Day).

Prioritization Logic

  • If Line Utilization = 0% → Declare Sev 1.
  • If 1 User is blocked but Line is running → Declare Sev 3.

On-Call & Vendor Escalation

Fatigue causes errors. Structure on-call rotations to ensure engineers are rested and rational during crises.

On-Call Protocol

  • Rotation: Weekly rotation. Primary and Secondary engineers must be defined.
  • Alerting: Configure monitoring tools (e.g., Datadog, Nagios) to page On-Call only for Sev 1/Sev 2 events.
  • Compensation: Formalize "Time Off in Lieu" or financial stipends to prevent burnout.

Vendor Escalation Strategy

External vendors (ERP providers, Hardware suppliers) typically charge for support or have strict SLAs. Do not burn credits on invalid claims.

  • Pre-Escalation Checklist:
    1. Reproduce the issue in the QA/Staging environment.
    2. Isolate the variable (Standard Product vs. Customization).
    3. If Custom Code is the root cause → Internal Fix (L3).
    4. If Standard Product fails → Open Vendor Ticket.

Pro-Tip: When opening a vendor ticket, provide the " Business Impact" in currency (e.g., "$50k/hour downtime"). This bypasses their L1 support and routes directly to their escalation engineers.

Final Checklist

Category

Metric / Control

Threshold / Rule

Triage

L1 Resolution Rate

≥ 60% of total volume

Process

Escalation Quality

100% of L2→L3 tickets have Logs + Repro Steps

Response

Sev 1 Response Time

≤ 15 Minutes (24/7)

vendor

External Tickets

Only for Standard Product defects

Access

Privileged Access

Only L2/L3 have Write access to DB

Documentation

KEDB Updates

1 New Article per confirmed Bug Fix

Availability

On-Call Coverage

100% Shift Coverage (Primary + Secondary)