1.2 Interoperability and governance
A system architecture without governance is not truly an architecture; rather, it often devolves into a fragile topology of point-to-point connections. In a high-volume manufacturing environment, achieving true interoperability requires the discipline of defining clear boundaries and contracts. For example, when System A writes directly into the database of System B, a boundary has been violated. When System A changes a message format and consequently crashes System B, a contract has been broken.
This chapter establishes the governing principles for how systems in your landscape—such as ERP, MES, and SCADA—coexist. These rules serve as essential architectural constraints to ensure long-term stability and scalability.
Architectural topology rules
Section titled “Architectural topology rules”To build a robust system, we must avoid creating fragile bridges between applications. Enforcing decoupling is a fundamental principle.
Rule 1: Abolish direct database-to-database integration.
Section titled “Rule 1: Abolish direct database-to-database integration.”- Prohibition: External systems should not be permitted to execute INSERT, UPDATE, or DELETE statements directly on another system’s SQL database.
- Why: This practice bypasses the crucial business logic and validation layers within the target system. For instance, it might allow a work order to be created for a part number that doesn’t exist, leading to orphaned or inaccurate records.
- Guideline: All integration should occur through a defined abstraction layer, such as an API, a message broker, or an enterprise service bus. This ensures all business rules are applied.
Rule 2: Prefer a “hub” over a “mesh” topology.
Section titled “Rule 2: Prefer a “hub” over a “mesh” topology.”- Constraint: Avoid creating direct mesh connections (where System A connects to B, A to C, and B to C, etc.). This approach scales poorly, introducing unnecessary complexity as the number of connections grows.
- Guideline: Implement a Hub-and-Spoke or Unified Namespace (UNS) pattern. Systems should publish events to a central broker (like MQTT or Kafka) or communicate through a central API gateway.
- Benefit: This centralization dramatically simplifies maintenance. When you need to replace a core system like the ERP, you only have to update the single connector at the hub, rather than finding and modifying numerous point-to-point scripts scattered across the network.
The interface control document (ICD)
Section titled “The interface control document (ICD)”The Interface Control Document (ICD) acts as the formal contract between two systems or architectural components. It is highly advisable that no integration code is written until the corresponding ICD has been reviewed and formally approved.
Mandatory ICD components:
Section titled “Mandatory ICD components:”- Transport Protocol: Specify the communication method (e.g., HTTPS REST, MQTT, TCP Socket).
- Directionality: Define who initiates the communication (Push vs. Pull model).
- Authentication: Detail the method (e.g., API Key, OAuth, Certificate).
- Schema Definition: Provide the exact structure of the message payload (JSON/XML).
- Strict Typing: For example, a quantity must be defined as an Integer type, not a generic String.
- Unit of Measure: Values like temperature must explicitly state their unit (e.g., Celsius), not just be a number like “240”.
- Error States: Document how the system signals different types of failures (e.g., using HTTP 400 for client errors vs. 500 for server errors).
Semantic governance: naming & IDs
Section titled “Semantic governance: naming & IDs”Without the ability to uniquely and consistently identify objects across systems, effective control and data aggregation become impossible.
Naming strategy: follow the ISA-95 hierarchy
Section titled “Naming strategy: follow the ISA-95 hierarchy”System and asset names should not be invented arbitrarily. Instead, use the physical hierarchy of your factory to create logical, consistent namespaces.
- Format: Site/Area/Line/Cell/Device
- Example: MEX01/SMT/Line04/Pick & Place02/Feeder12
- Why: This structure allows for logical data aggregation. A query for
MEX01/SMT/*would correctly return all performance data for the SMT area at that site.
Identity strategy: implement an immutable UID
Section titled “Identity strategy: implement an immutable UID”- The Problem: Vendor-provided serial numbers are not guaranteed to be unique across different vendors or product types. A resistor reel from Vendor A might share the same ID as a capacitor reel from Vendor B.
- The Solution: Generate an Internal Unique Identifier (UID) at the point of entry into your system, such as during the Receiving process.
- Implementation: Use a robust method like a UUID (e.g.,
550e8400-e29b-41d4-a716-446655440000) or a prefixed integer scheme (e.g.,UID-999999). This Internal UID should then be used as the primary key in all relevant database tables to ensure unambiguous relationships.
Temporal governance: time synchronization
Section titled “Temporal governance: time synchronization”Distributed manufacturing systems rely on accurate timing to maintain a correct sequence of events. When system clocks drift apart, the logic that determines cause-and-effect can break down.
Implementing network time protocol (NTP)
Section titled “Implementing network time protocol (NTP)”- Master Clock: Deploy a local Stratum 1 or 2 NTP server within your operational technology (OT) network for reliable timekeeping.
- Drift Tolerance: Maintain a maximum clock drift of ±500 milliseconds between systems.
- UTC Standardization:
- Storage: Always record timestamps in databases and log files using Coordinated Universal Time (UTC) in the ISO 8601 format.
- Display: Convert timestamps from UTC to the user’s local time zone only at the presentation layer (e.g., on an operator’s screen).
- Risk: Storing timestamps in local time can lead to duplicate or missing records during Daylight Saving Time transitions or when comparing data across time zones.
Message resilience & versioning
Section titled “Message resilience & versioning”Design your integrations with the assumption that networks will fail and APIs will evolve. Building resilience and clear versioning strategies from the start is critical.
Versioning policy
Section titled “Versioning policy”- Golden Rule: Never break an existing integration contract.
- Implementation: Use semantic versioning directly in your API endpoints.
- Example: Keep the legacy
POST /api/v1/work-orderactive. - Example: Deploy new features to
POST /api/v2/work-order.
- Example: Keep the legacy
- Deprecation: When introducing a new version, maintain support for the previous version (e.g., “v1”) for a minimum period, such as six months, to give consumers time to migrate.
Error handling & idempotency
Section titled “Error handling & idempotency”- Scenario: Consider a case where the MES sends a “Material Consumption” message to the ERP. The ERP receives and processes it successfully, but the acknowledgment message back to the MES is lost in transit. The MES, interpreting the lack of response as a failure, retransmits the original message.
- Risk: Without proper safeguards, the ERP might process the transaction a second time, deducting the materials twice and creating false inventory shortages.
- Requirement: The receiving system (the ERP in this case) should be designed to be idempotent. It should inspect a unique “Message-ID” in the incoming request. If it recognizes that it has already successfully processed a message with that specific ID, it should simply return a “Success” acknowledgment without re-executing the business transaction.
Store-and-forward (buffering)
Section titled “Store-and-forward (buffering)”- Constraint: Network interruptions and momentary drops are a reality in industrial environments.
- Requirement: All edge gateways and critical MES interfaces must be configured to buffer messages locally (on disk or in a persistent queue) if the connection to the upstream system is lost.
- Recovery: Once the connection is restored, the system must flush the buffered messages in strict First-In, First-Out (FIFO) order. This preserves the original sequence of events, which is vital for maintaining accurate production genealogy and logs.
Recap: Interoperability and Governance
Section titled “Recap: Interoperability and Governance”| Parameter | Requirement | Constraint / Value | Action / Condition | Document |
|---|---|---|---|---|
| Database Integration | Prohibit direct database access. | No external INSERT/UPDATE/DELETE on another system’s SQL DB. | Integrate via defined abstraction layer (API, broker, ESB). | ICD |
| Network Topology | Prefer hub over mesh topology. | Avoid direct system-to-system connections. | Implement hub-and-spoke or Unified Namespace via central broker/gateway. | Architectural Rules |
| Interface Contract | Define formal integration contract. | Must specify: Transport Protocol, Directionality, Authentication, Schema (strict typing, units), Error States. | No integration code written before ICD is reviewed and approved. | ICD |
| Object Identification | Use unique, immutable internal identifier. | Generate Internal UID (e.g., UUID) at system entry point. | Use UID as primary key in all relevant database tables. | Identification Strategy |
| Time Synchronization | Synchronize all system clocks. | Max drift: ±500 ms. Storage format: UTC (ISO 8601). | Deploy local NTP server. Convert to local time only at presentation layer. | NTP Recommendation |
| Message Resilience | Ensure idempotent processing and versioning. | Inspect unique Message-ID. Use semantic versioning in API endpoints (e.g., /api/v2/). | Maintain previous API version for ≥6 months after deprecation. Buffer messages locally on connection loss; flush in FIFO order. | Versioning & Error Handling Policy |