2.4 Deployment Architecture

Cloud-first architectures fail on the factory floor because the internet is not a real-time control network. Latency, jitter, and outages are physical realities. The operational architecture must follow the "Submarine Principle": the factory must be able to operate autonomously, retaining all data integrity, even when cut off from the outside world.

The Edge Collector Strategy

Do not connect high-frequency machine telemetry directly to a central database. The bandwidth cost is wasteful, and the latency is unacceptable. Deploy Edge Collectors (Industrial PCs or Gateways) at the machine level (Level 1/2).

Responsibilities of the Edge:

Poll: Query the PLC at high frequency (10ms - 100ms).
Normalize: Convert raw register 40001 to Oven_Temp_Zone1.
Filter: Report "Change by Exception" (Deadband) to reduce noise.
Buffer: Store data locally if the uplink fails.

The "1-to-N" Ratio

Complex Machines (SMT, CNC): 1 Edge Gateway per Machine.
Simple Assets (Conveyors, Scales): 1 Edge Gateway per Line (aggregating multiple IO blocks).

Time Synchronization (NTP)

Genealogy relies on chronology. If Machine A thinks it is 12:00:00 and Machine B thinks it is 11:59:50, you cannot prove which process happened first. Windows Time is insufficient for industrial precision.

The Standard:

Protocol: NTPv4 (Network Time Protocol).
Source: Local Stratum-2 Server (GPS or Atomic Clock linked). Do not rely on public internet pools (pool.ntp.org) for the OT network.
Drift Tolerance: Maximum ±500ms deviation.

Drift Logic:

If Offset > 500ms → Then Flag Data Quality as "Suspect".
If Offset > 2000ms → Then Trigger Maintenance Alert. Check CMOS battery on the IPC.

Store-and-Forward (Buffering)

The network will fail. When it does, data must flow into a local reservoir, not onto the floor.

Buffering Capacity Rules:

Target: Minimum 72 Hours of local retention. (Enough to survive a weekend outage).
Storage Medium: Industrial SSD (High TBW rating) or localized SQLite database.
Reconnection Logic:
1. LIFO (Last In, First Out) for Status: The dashboard needs the current state immediately.
2. FIFO (First In, First Out) for History: Backfill the historical gaps in chronological order.

Data Loss Strategy (The "Full Disk" Scenario):

If Buffer > 90% Full → Then Trigger Critical IT Ops Alert.
If Buffer = 100% Full:
- Traceability Data (Serial #s, Pass/Fail): STOP THE LINE. Compliance data cannot be discarded.
- Telemetry Data (Amps, Volts, Temps): Overwrite Oldest (Ring Buffer).

Monitoring the Monitors

An Edge Collector that has silently crashed is worse than no collector at all. You need a "Heartbeat" mechanism.

Watchdog Logic:

Heartbeat: Edge sends a "Keep-Alive" pulse every 60 seconds.
Latency Check: Measure Time_Sent vs. Time_Received.
Resource Thresholds:
- CPU: Alert if > 80% for 15 mins.
- RAM: Alert if > 90%.
- Disk: Alert if Free Space < 20%.

Pro-Tip: Use a "Store-and-Forward" flag in your message payload. When backfilling data, mark it as delayed_ingest = true. This prevents your analytics engine from triggering false "Latency Alarms" when processing historical records.

Final Checklist

Category	Metric / Control	Threshold / Rule
Time	NTP Sync	Local Stratum-2 Source. Max Drift < 500ms.
Resilience	Buffer Capacity	Min. 72 Hours of local storage at full data rate.
Continuity	Offline Mode	Traceability = Stop Line; Telemetry = Ring Buffer.
Load	Deadbands	Only transmit analog changes > 0.5% (Configurable).
Health	Watchdog	Server alerts if Edge Heartbeat missing > 3 mins.
Hardware	Specs	Industrial Grade (Fanless, SSD). No Office PCs.
Recovery	Backfill Order	Prioritize Live Status, then backfill History.