1.2 Interoperability and governance
A system architecture without governance is not truly an architecture; rather, it often devolves into a fragile topology of point-to-point connections. In a high-volume manufacturing environment, interoperability requires the discipline of defining clear boundaries and contracts. When System A writes directly into the database of System B, a boundary has been violated. When System A changes a message format and consequently crashes System B, a contract has been broken.
This chapter establishes the governing principles for how systems in your landscape (ERP, MES, SCADA) coexist. These rules serve as essential architectural constraints to ensure stability and scalability.
Architectural topology rules
Section titled “Architectural topology rules”Fragile bridges must not be built. Decoupling must be enforced.
Rule 1: abolish DB-to-DB integration.
Section titled “Rule 1: abolish DB-to-DB integration.”- Prohibition: Do not allow an external system to execute INSERT, UPDATE, or DELETE directly on another system’s SQL database.
- Why: This practice bypasses crucial business logic validation (for instance, verifying if a
Part Number exists before attempting to create a Work Order) and can lead to orphaned or inaccurate records. - Guideline: All integration should occur via an Abstraction Layer, such as an API, Message Broker, or Enterprise Service Bus.
Rule 2: the “hub” vs. “mesh” decision.
Section titled “Rule 2: the “hub” vs. “mesh” decision.”- Constraint: Direct mesh connections (System A ↔ System B, A ↔ C, B ↔ C) should be avoided, as this approach scales with unnecessary complexity (N(N-1)/2).
- Guideline: Implement a Hub-and-Spoke or Unified Namespace (UNS) pattern. Systems should publish events to a central broker (like MQTT or Kafka) or call a central API Gateway.
- Benefit: When replacing a core system like the ERP, you only need to update a single connector at the Hub, rather than managing numerous distinct point-to-point scripts.
The interface control document (ICD)
Section titled “The interface control document (ICD)”The ICD serves as the formal agreement between two architectural blocks. It is highly recommended that no integration code is written until the ICD is reviewed and approved.
Mandatory ICD components:
Section titled “Mandatory ICD components:”- Transport Protocol: (e.g. HTTPS REST, MQTT, TCP Socket).
- Directionality: Who initiates? (Push vs. Pull).
- Authentication: API Key, OAuth, or Certificate.
- Schema Definition: The exact payload structure (JSON/XML).
- Strict Typing: Quantity must be defined as Integer, not String.
- Unit of Measure: Temperature must be defined as Celsius, not just 240.
- Error States: How does the system signal failure? (HTTP 400 vs 500).
Pro-Tip: ICDs must be stored in a Git repository alongside the code. They are living documents, not PDFs buried in SharePoint.
Semantic governance: naming & IDs
Section titled “Semantic governance: naming & IDs”Without the ability to uniquely identify an object, effective control becomes impossible.
Naming strategy: the ISA-95 hierarchy
Section titled “Naming strategy: the ISA-95 hierarchy”Names must not be invented. The physical hierarchy must be used to create logical namespaces.
- Format: Site/Area/Line/Cell/Device
- Example: MEX01/SMT/Line04/Pick & Place02/Feeder12
- Why: This allows you to aggregate data logically. A query for MEX01/SMT/* returns all SMT performance for the site.
Identity strategy: the immutable UID
Section titled “Identity strategy: the immutable UID”- The Problem: Vendor Serial Numbers are not unique globally. A resistor reel from Vendor A might have the same ID as a capacitor reel from Vendor B.
- The Mandate: Generate an Internal Unique Identifier (UID) at the point of entry (Receiving).
- Implementation: UUID (e.g. 550e8400-e29b…) or a prefixed integer (UID-999999) must be used. This Internal UID must be used as the Primary Key in all database relations.
Temporal governance: time synchronization
Section titled “Temporal governance: time synchronization”Distributed systems rely heavily on accurate timing. When clocks drift across different systems, the logic governing cause-and-effect can break down.
The NTP mandate
Section titled “The NTP mandate”- Master: Deploy a local Stratum 1/2 NTP Server in the OT Network.
- Drift Tolerance: ±500ms max.
- UTC Standardization:
- Storage: All timestamps in databases and logs should be recorded in UTC (ISO 8601).
- Display: Convert timestamps to Local Time only at the presentation layer (e.g. the Operator Screen).
- Risk: Storing records in Local Time introduces the risk of duplicating or losing data during Daylight Savings Time transitions.
Message resilience & versioning
Section titled “Message resilience & versioning”It must be assumed the network will fail. It must be assumed the API will change.
Versioning policy
Section titled “Versioning policy”- Rule: Never break the contract.
- Implementation: Semantic Versioning must be used in the Endpoint.
- “POST /api/v1/work-order” (Legacy)
- “POST /api/v2/work-order” (New Feature)
- Deprecation: Support for “v-1” must be maintained for a minimum of 6 months.
Error handling & idempotency
Section titled “Error handling & idempotency”- Scenario: The MES transmits a “Consumption” message to the ERP. The ERP successfully receives and processes it, but the Acknowledgement response is lost in transit. The MES, assuming a failure occurred, retries the transmission.
- Risk: The ERP might deduct the materials twice, creating inaccurate shortages in the system.
- Requirement: The receiving system should be Idempotent. It needs to inspect the “Message-ID”. When it recognizes that it has already successfully processed a specific message (like “Msg-101”), it should simply return a “Success” acknowledgment without executing the transaction a second time.
Store-and-forward (buffering)
Section titled “Store-and-forward (buffering)”- Constraint: Network partitions and momentary drops are systematically inevitable.
- Mandate: All Edge Gateways and MES interfaces are required to buffer messages locally (via Disk or Queue) if the upstream connection is lost.
- Recovery: Upon connection restoration, the buffer must be flushed in strict FIFO (First-In, First-Out) order to preserve the original sequence of events.
Final Checkout: Interoperability and governance
Section titled “Final Checkout: Interoperability and governance”| Governance Pillar | Control Point | Mandatory Standard | Engineering Consequence |
|---|---|---|---|
| Contract | ICD | Signed & Versioned | Prevents “Tribal Knowledge” integrations that are unmaintainable. |
| Topology | Decoupling | No Direct SQL Access | Protects data integrity and allows independent system upgrades. |
| Time | NTP Sync | UTC + Local NTP | Guarantees accurate sequence of events for genealogy. |
| Naming | Namespace | ISA-95 Structured | Enables scalable analytics and clear asset management. |
| Resilience | Retry Logic | Idempotent Receiver | Prevents double-counting inventory during network jitters. |
| Versioning | API Lifecycle | Explicit (v1, v2) | Prevents “Big Bang” deployments; enables safe rollouts. |