5.2 Change & release management
A manufacturing facility operates with different constraints than a software startup. A “Move fast and break things” approach can unfortunately translate to stopping the line and impacting revenue. The primary goal of Release Management is Stability. Every change to the MES or ERP should be treated with the utmost care: carefully packaged, systematically routed, and easily reversible.
The three-tier environment
Section titled “The three-tier environment”It is standard practice to avoid developing or testing directly on the production server. Strict isolation between environments is fundamental to preventing data corruption and downtime.
DEV (development)
Section titled “DEV (development)”- Purpose: The Sandbox. Developers write code and break things here.
- Data: Synthetic / Dummy data.
- Access: Full Admin rights for Developers.
- SLA: None.
UAT (user acceptance testing / staging)
Section titled “UAT (user acceptance testing / staging)”- Purpose: The Mirror. This environment serves as an exact replica of the Production hardware and software configuration.
- Data: An anonymized copy of Production data (ideally refreshed monthly).
- Access: Read-only for Developers; Read/Write for Key Users (Testers).
- Logic: When a change succeeds in DEV but fails in UAT, the release is rejected. This typically indicates environment configuration drift.
PROD (production)
Section titled “PROD (production)”- Purpose: The Live Manufacturing Environment.
- Data: Live, real-time master data and daily transactions.
- Access: Generally requires Zero Write access for Developers. Changes are applied securely via automated deployment scripts or managed by System Admins.
- Rule: “Hot-fixing” directly in PROD introduces significant risk and should be heavily restricted.
The release gate: governance logic
Section titled “The release gate: governance logic”Code should not migrate from UAT to PROD based solely on verbal assurance. It moves based on documented Evidence.
The request for change (RFC)
Section titled “The request for change (RFC)”Every release must have a ticket containing:
- Impact Analysis: Which lines/modules are affected?
- Test Evidence: Screenshots/Logs of the pass in UAT.
- Rollback Plan: The exact script to undo the change if it fails.
- Timing: Estimated downtime.
The approval matrix
Section titled “The approval matrix”- Minor Patch (Bug fix): IT Manager Approval.
- Feature Release (New Logic): IT Manager + Operations Manager Approval.
- Major Upgrade (Architecture): CIO + Plant Director Approval.
Release Windows & blackout periods
Section titled “Release Windows & blackout periods”Timing is critical. Deploying updates during periods when the factory is highly vulnerable or lacks sufficient support must be avoided.
The “safe” window
Section titled “The “safe” window”- Time: Tuesday, Wednesday, or Thursday. 09:00 – 11:00 or 14:00 – 16:00.
- Why: IT Support is in the office. Operations leadership is present.
- Logic: Deployments require “All Hands on Deck.”
The restriction zone (blackouts)
Section titled “The restriction zone (blackouts)”- Fridays: Deploying on Friday increases the risk of requiring emergency weekend support.
- Shift Change: (e.g. 06:00, 14:00, 22:00). These periods inherently have high operational noise and distraction.
- Peak Season / End of Quarter: System changes must be limited when maximum throughput is required to meet customer commitments.
Versioning strategy
Section titled “Versioning strategy”Semantic Versioning (Major.Minor.Patch) must be adopted to communicate risk to the business.
- Major (v2.0.0): Breaking Change. Requires downtime and Operator Retraining.
- Minor (v1.1.0): New Feature. Backward compatible. No downtime required.
- Patch (v1.0.1): Bug Fix. Invisible to the user.
Rollback rules (the “undo” button)
Section titled “Rollback rules (the “undo” button)”A deployment plan that lacks a reliable rollback plan introduces significant operational risk. The team must possess the capability to return the system to its previous known-good state, ideally within < 15 Minutes.
The snapshot rule
Section titled “The snapshot rule”- Virtual Machines: A full VM Snapshot must be taken before applying the update.
- Database: A Transaction Log backup must be taken immediately before the script runs.
The 15-minute timer
Section titled “The 15-minute timer”- Trigger: Deployment begins.
- Check: At T+15 minutes, execute the “Smoke Test” (e.g. Print one label, successfully complete one cycle).
- Logic:
- When the Smoke Test Passes, commit the change.
- When the Smoke Test Fails, or if overall Performance degrades significantly (e.g. > 20%), the team should Execute the Rollback Immediately. It is generally safer to revert than to attempt live troubleshooting during production hours.
Recap: Change & Release Management
Section titled “Recap: Change & Release Management”| Parameter | Requirement | Value / Criterion | Action / Condition |
|---|---|---|---|
| Environment Migration Path | UAT to PROD only | UAT must be exact hardware/software replica of PROD | Reject release if change passes DEV but fails UAT |
| Release Governance | Mandatory RFC with approval | RFC contains: Impact Analysis, UAT Test Evidence, Rollback Plan, Estimated Downtime. Approval per matrix (e.g., Minor Patch: IT Manager) | Deploy only with approved RFC and documented evidence |
| Deployment Window | Within “Safe” window only | Tue-Thu, 09:00-11:00 or 14:00-16:00. No deployments on Fridays, during shift changes, or peak seasons | Schedule all deployments within this window |
| Rollback Execution | Mandatory, time-boxed | Full VM snapshot & DB backup pre-deployment. Execute rollback if Smoke Test fails or performance degrades >20% at T+15 minutes | Revert system to previous state within 15 minutes of failure |
| System Versioning | Semantic Versioning enforced | Major.Minor.Patch. Version number must be displayed in MES interface footer | Use version to triage user-reported issues |