Skip to main content

2.4 Deployment Architecture

Cloud-first architectures fail on the factory floor because the internet is not a real-time control network. Latency, jitter, and outages are physical realities. The operational architecture must follow the "Submarine Principle": the factory must be able to operate autonomously, retaining all data integrity, even when cut off from the outside world.

The Edge Collector Strategy

Do not connect high-frequency machine telemetry directly to a central database. The bandwidth cost is wasteful, and the latency is unacceptable. Deploy Edge Collectors (Industrial PCs or Gateways) at the machine level (Level 1/2).

Responsibilities of the Edge:

  1. Poll: Query the PLC at high frequency (10ms - 100ms).
  2. Normalize: Convert raw register 40001 to Oven_Temp_Zone1.
  3. Filter: Report "Change by Exception" (Deadband) to reduce noise.
  4. Buffer: Store data locally if the uplink fails.

The "1-to-N" Ratio

  • Complex Machines (SMT, CNC): 1 Edge Gateway per Machine.
  • Simple Assets (Conveyors, Scales): 1 Edge Gateway per Line (aggregating multiple IO blocks).

Time Synchronization (NTP)

Genealogy relies on chronology. If Machine A thinks it is 12:00:00 and Machine B thinks it is 11:59:50, you cannot prove which process happened first. Windows Time is insufficient for industrial precision.

The Standard:

  • Protocol: NTPv4 (Network Time Protocol).
  • Source: Local Stratum-2 Server (GPS or Atomic Clock linked). Do not rely on public internet pools (pool.ntp.org) for the OT network.
  • Drift Tolerance: Maximum ±500ms deviation.

Drift Logic:

  • If Offset > 500ms → Then Flag Data Quality as "Suspect".
  • If Offset > 2000ms → Then Trigger Maintenance Alert. Check CMOS battery on the IPC.

Store-and-Forward (Buffering)

The network will fail. When it does, data must flow into a local reservoir, not onto the floor.

Buffering Capacity Rules:

  • Target: Minimum 72 Hours of local retention. (Enough to survive a weekend outage).
  • Storage Medium: Industrial SSD (High TBW rating) or localized SQLite database.
  • Reconnection Logic:
    1. LIFO (Last In, First Out) for Status: The dashboard needs the current state immediately.
    2. FIFO (First In, First Out) for History: Backfill the historical gaps in chronological order.

Data Loss Strategy (The "Full Disk" Scenario):

  • If Buffer > 90% Full → Then Trigger Critical IT Ops Alert.
  • If Buffer = 100% Full:
    • Traceability Data (Serial #s, Pass/Fail): STOP THE LINE. Compliance data cannot be discarded.
    • Telemetry Data (Amps, Volts, Temps): Overwrite Oldest (Ring Buffer).

Monitoring the Monitors

An Edge Collector that has silently crashed is worse than no collector at all. You need a "Heartbeat" mechanism.

Watchdog Logic:

  • Heartbeat: Edge sends a "Keep-Alive" pulse every 60 seconds.
  • Latency Check: Measure Time_Sent vs. Time_Received.
  • Resource Thresholds:
    • CPU: Alert if > 80% for 15 mins.
    • RAM: Alert if > 90%.
    • Disk: Alert if Free Space < 20%.

Pro-Tip: Use a "Store-and-Forward" flag in your message payload. When backfilling data, mark it as delayed_ingest = true. This prevents your analytics engine from triggering false "Latency Alarms" when processing historical records.

Final Checklist

Category

Metric / Control

Threshold / Rule

Time

NTP Sync

Local Stratum-2 Source. Max Drift < 500ms.

Resilience

Buffer Capacity

Min. 72 Hours of local storage at full data rate.

Continuity

Offline Mode

Traceability = Stop Line; Telemetry = Ring Buffer.

Load

Deadbands

Only transmit analog changes > 0.5% (Configurable).

Health

Watchdog

Server alerts if Edge Heartbeat missing > 3 mins.

Hardware

Specs

Industrial Grade (Fanless, SSD). No Office PCs.

Recovery

Backfill Order

Prioritize Live Status, then backfill History.