2.4 Deployment Architecture
Cloud-first architectures fail on the factory floor because the internet is not a real-time control network. Latency, jitter, and outages are physical realities. The operational architecture must follow the "Submarine Principle": the factory must be able to operate autonomously, retaining all data integrity, even when cut off from the outside world.
The Edge Collector Strategy
Do not connect high-frequency machine telemetry directly to a central database. The bandwidth cost is wasteful, and the latency is unacceptable. Deploy Edge Collectors (Industrial PCs or Gateways) at the machine level (Level 1/2).
Responsibilities of the Edge:
- Poll: Query the PLC at high frequency (10ms - 100ms).
- Normalize: Convert raw register 40001 to Oven_Temp_Zone1.
- Filter: Report "Change by Exception" (Deadband) to reduce noise.
- Buffer: Store data locally if the uplink fails.
The "1-to-N" Ratio
- Complex Machines (SMT, CNC): 1 Edge Gateway per Machine.
- Simple Assets (Conveyors, Scales): 1 Edge Gateway per Line (aggregating multiple IO blocks).
Time Synchronization (NTP)
Genealogy relies on chronology. If Machine A thinks it is 12:00:00 and Machine B thinks it is 11:59:50, you cannot prove which process happened first. Windows Time is insufficient for industrial precision.
The Standard:
- Protocol: NTPv4 (Network Time Protocol).
- Source: Local Stratum-2 Server (GPS or Atomic Clock linked). Do not rely on public internet pools (pool.ntp.org) for the OT network.
- Drift Tolerance: Maximum ±500ms deviation.
Drift Logic:
- If Offset > 500ms → Then Flag Data Quality as "Suspect".
- If Offset > 2000ms → Then Trigger Maintenance Alert. Check CMOS battery on the IPC.
Store-and-Forward (Buffering)
The network will fail. When it does, data must flow into a local reservoir, not onto the floor.
Buffering Capacity Rules:
- Target: Minimum 72 Hours of local retention. (Enough to survive a weekend outage).
- Storage Medium: Industrial SSD (High TBW rating) or localized SQLite database.
- Reconnection Logic:
- LIFO (Last In, First Out) for Status: The dashboard needs the current state immediately.
- FIFO (First In, First Out) for History: Backfill the historical gaps in chronological order.
Data Loss Strategy (The "Full Disk" Scenario):
- If Buffer > 90% Full → Then Trigger Critical IT Ops Alert.
- If Buffer = 100% Full:
- Traceability Data (Serial #s, Pass/Fail): STOP THE LINE. Compliance data cannot be discarded.
- Telemetry Data (Amps, Volts, Temps): Overwrite Oldest (Ring Buffer).
Monitoring the Monitors
An Edge Collector that has silently crashed is worse than no collector at all. You need a "Heartbeat" mechanism.
Watchdog Logic:
- Heartbeat: Edge sends a "Keep-Alive" pulse every 60 seconds.
- Latency Check: Measure Time_Sent vs. Time_Received.
- Resource Thresholds:
- CPU: Alert if > 80% for 15 mins.
- RAM: Alert if > 90%.
- Disk: Alert if Free Space < 20%.
Pro-Tip: Use a "Store-and-Forward" flag in your message payload. When backfilling data, mark it as delayed_ingest = true. This prevents your analytics engine from triggering false "Latency Alarms" when processing historical records.
Final Checklist
Category | Metric / Control | Threshold / Rule |
Time | NTP Sync | Local Stratum-2 Source. Max Drift < 500ms. |
Resilience | Buffer Capacity | Min. 72 Hours of local storage at full data rate. |
Continuity | Offline Mode | Traceability = Stop Line; Telemetry = Ring Buffer. |
Load | Deadbands | Only transmit analog changes > 0.5% (Configurable). |
Health | Watchdog | Server alerts if Edge Heartbeat missing > 3 mins. |
Hardware | Specs | Industrial Grade (Fanless, SSD). No Office PCs. |
Recovery | Backfill Order | Prioritize Live Status, then backfill History. |