2.4 Deployment architecture
Cloud-first architectures fail on the factory floor because the internet is not a real-time control network. Latency, jitter, and outages are physical realities. The operational architecture must follow the “Submarine Principle”: the factory must be able to operate autonomously, retaining all data integrity, even when cut off from the outside world.
The edge collector strategy
Section titled “The edge collector strategy”It is strongly recommended to avoid connecting high-frequency machine telemetry directly to a central database. The bandwidth utilization is often inefficient, and the associated latency can disrupt real-time operations. The preferred approach is to deploy Edge Collectors (such as Industrial PCs or specialized Gateways) directly at the machine level (Level 1/2).
Responsibilities of the Edge:
- Poll: Query the PLC at high frequency (10ms - 100ms).
- Normalize: Convert raw register 40001 to Oven_Temp_Zone1.
- Filter: Report “Change by Exception” (Deadband) to reduce noise.
- Buffer: Store data locally if the uplink fails.
The “1-to-n” ratio
Section titled “The “1-to-n” ratio”- Complex Machines (e.g. SMT, CNC): Allocate 1 Edge Gateway per Machine.
- Simple Assets (e.g. Conveyors, Scales): Allocate 1 Edge Gateway per Line (aggregating data from multiple IO blocks).
Time synchronization (NTP)
Section titled “Time synchronization (NTP)”Genealogy relies on chronology. If Machine A thinks it is 12:00:00 and Machine B thinks it is 11:59:50, you cannot prove which process happened first. Windows Time is insufficient for industrial precision.
The standard
Section titled “The standard”- Protocol: Use NTPv4 (Network Time Protocol).
- Source: Rely on a Local Stratum-2 Server (linked to GPS or an Atomic Clock). It is not advisable to depend on public internet pools (like pool.ntp.org) for the isolated OT network.
- Drift Tolerance: Maintain a maximum deviation of ±500ms.
Drift logic
Section titled “Drift logic”- When the time offset exceeds 500ms, the system should flag the Data Quality as “Suspect”.
- When the offset exceeds 2000ms, the system should trigger a Maintenance Alert. This significant drift typically indicates a hardware issue, such as a failing CMOS battery on the IPC.
Store-and-forward (buffering)
Section titled “Store-and-forward (buffering)”The network will fail. When it does, data must flow into a local reservoir, not onto the floor.
Buffering capacity rules
Section titled “Buffering capacity rules”- Target: Design for a minimum of 72 Hours of local retention. (This is generally sufficient to survive a weekend network outage).
- Storage Medium: Utilize an Industrial SSD (with a High TBW rating) or a localized SQLite database.
- Reconnection Logic:
- LIFO (Last In, First Out) for Status: The dashboard requires the most current state information immediately upon reconnection.
- FIFO (First In, First Out) for History: The system should then backfill the historical data gaps in strict chronological order.
Data loss strategy (the “full disk” scenario)
Section titled “Data loss strategy (the “full disk” scenario)”- When the local buffer exceeds 90% capacity, the system should trigger a Critical IT Ops Alert.
- When the buffer reaches 100% capacity:
- Traceability Data (Serial #s, Pass/Fail): The system should safely stop the line immediately. Compliance data is critical and should not be inadvertently discarded.
- Telemetry Data (Amps, Volts, Temps): The system should overwrite the oldest telemetry data first, following a Ring Buffer protocol.
Monitoring the monitors
Section titled “Monitoring the monitors”An Edge Collector that has silently crashed is worse than no collector at all. A “Heartbeat” mechanism is required.
Watchdog logic
Section titled “Watchdog logic”- Heartbeat: The Edge device sends a “Keep-Alive” pulse regularly (e.g. every 60 seconds).
- Latency Check: The centralized system measures the Time_Sent against the Time_Received.
- Resource Thresholds:
- CPU: An alert must be triggered if utilization is > 80% for more than 15 minutes.
- RAM: An alert must be triggered if utilization is > 90%.
- Disk: An alert must be triggered if Free Space drops below 20%.
Recap: Edge Infrastructure Deployment Parameters
Section titled “Recap: Edge Infrastructure Deployment Parameters”| Component | Parameter | Requirement | Action on Violation |
|---|---|---|---|
| Edge Collector | Polling Frequency | 10ms - 100ms | — |
| Time Synchronization (NTP) | Clock Drift | ≤ ±500ms | >500ms: Flag data as “Suspect”. >2000ms: Trigger Maintenance Alert. |
| Local Buffer | Retention Capacity | ≥ 72 hours | >90% capacity: Trigger Critical IT Ops Alert. 100% capacity: Halt line for traceability data; overwrite oldest telemetry. |
| Edge Health | Heartbeat Interval | 60 seconds | Missing pulse: Alert for collector failure. |
| Edge Resources | CPU / RAM / Disk | CPU ≤80% (15-min avg) RAM ≤90% Disk Free ≥20% | Trigger alert for threshold violation. |