CTI - Ensuring System Reliability and Sustainability

Overview

Ensuring Control System Sustainability
Improving Reliability in Ethernet Control Networks

CTI's customers are diverse in terms of geography, size, industry, manufacturing process, and control system complexity. The one commonality across most of these CTI customer sites is that although much of the installed base of PLCs was put into operation between 1995 and 2005, these systems have operated reliably for 20+ years. That’s remarkable longevity. But as these systems continue to age, what are CTI customers doing to ensure this reliable operation continues for years to come? How are they trying to achieve continued sustainability, and how can CTI help them? [click HERE for a quick link to Global Perspectives piece on Best Practices for Maintaining Control System Reliability]

Additionally, the proliferation of Ethernet-enabled devices on the factory floor -- from PLCs to I/O modules all the way down to sensors -- in support of the "Industrial Internet of Things" (IIoT), communications, and other data needs requires continued vigilence against Ethernet traffic disruptions that can have serious consequences for a control network. What types of Ethernet traffic "storms" can occur, and what are some techniques that can be used to prevent and mitigate against each type? [click HERE for a quick link to our Tech Tip on Improving Reliability in Ethernet Control Networks]

We at CTI are proud to be part of the legacy of the hard-working and long-lasting TI 505^® line, and, like you, we want to ensure that your control systems continue to run as safely, efficiently, cost effectively -- and reliably -- as possible with the least amount of disruption. We have chosen to provide our customers with products that allow for safe and seamless modernization of their process control systems without having to throw away all the institutional knowledge and investment in product, engineering and programming that has already been made. Our CTI 2500 Series^® product line offers completely refreshed and modernized versions of most of the original TI/Simatic 505^® products and brand-new, state-of-the-art products designed both to expand the capability of and to improve the performance of this -- or many other -- control systems. So, no matter which path you take to ensure the continued reliability and sustainability of your control system, CTI has a solution to help.

Ensuring Control System Sustainability

[Click here for PDF version of this content]

There are three typical approaches we at CTI see our customers taking to ensure the continued sustainability of their control systems while balancing their need to conserve capital:

1. Run It Until It Breaks -- The “run it till it breaks” approach is more common than you might expect. You might think this approach would be imprudent, but in some situations, it has worked well for customers, and it is definitely conservative of capital. This approach is most common in seasonal and batch operations where (for various reasons) the operation can stand an occasional day or two of downtime. Customers employing this approach purchase a minimal set of the most critical spares and then rely on quick availability of replacement parts which CTI can normally supply from our factory stock, or from stock at one of our distributors.

2. Service Life Replacement -- In this approach, customers settle on a “generally accepted service life” (typically 15 years), and then plan for a wholesale upgrade of the control components during a planned shutdown at the end of the aforementioned service life. This typically is done along with replacement of other major electro-mechanical components, so upgrading the control system becomes only one part of a “re-life” effort aimed at restoring the entire process to a “like new” condition.

On a recent trip to Europe, CTI's SVP of Sales met with a customer whose plant produces refractory material and who has a project like this currently underway. Their existing control system has reached the end of the service life they determined it should have, and they are planning on replacing all control components. The customer is fabricating completely new control cabinets with the latest CTI 2500 Series components. These cabinets will be placed on the factory floor beside the existing controls. When that effort is complete, they will do a simple change over during shutdown using a “field wiring extension” as shown below. Of course, it would also be possible to make this upgrade by simply replacing the PLC equipment in the existing control cabinet, but in this case, they made the decision to go with an entirely new control cabinet.

3. Hybrid -- the hybrid approach is a little more onerous in that it requires careful monitoring of system reliability, noting when there are process shutdowns due to PLC problems. When those shutdowns rise to a problem level, and particularly when those shutdowns have the signature of “wear out” failure, companies that follow this approach plan a partial or complete modernization of the PLC system.

What is this “signature” that indicates the beginning of problems associated with the end of service life? In a recent conversation with one of our customers in the air separation industry, we learned that they watch for “ghost faults” which look like the following:

a) Temperature or pressure readings which spike high or low for a few scans
b) Digital inputs which go in/out for a few scans
c) Failure of the PLC system to power-up properly after a planned shutdown
d) Re-occurring faults — especially if the problem jumps from rack to rack

When they see these kinds of faults happen, they begin planning an upgrade of the PLC system to replace the power supply, CPU, and I/O cards. And because CTI products are wiring-compatible and functionally the same as their existing Siemens modules, the shutdown time required to make the upgrade is minimal.

In addition to these “wholesale system upgrades,” they also pay close attention in all their systems to “early wear,” even when there are no ghost faults occurring. For example, they try to eliminate the use of relay modules, which have a fixed life, replacing them instead with solid-state digital output modules.

When they do have a planned major electro-mechanical upgrade of a plant, they include a PLC system upgrade as part of that budget. Normally, the cost of the PLC system is low compared to other components like motors, bearings, and compressors. And if the budget won’t support a PLC system upgrade, then they will increase their level of spares over time so that they have the materials available for a complete PLC upgrade in the future.

This hybrid approach, although it requires a little more work, is probably the best approach for conserving capital because it focuses on continued use of the existing system for as long as the reliability is good. This maximizes the return from that initial investment. Careful monitoring for signs of component failure is essential, however, in order to minimize the risk of unexpected shutdowns.

CTI Solutions -- Regardless of which approach you choose, CTI has the solutions to not only preserve the reliability of your Simatic/TI 505 control system, but also to modernize it, adding new capabilities to make the process even better. And best of all, because CTI 2500 Series products are compatible with the older TI/Simatic 505 (and even 500!) systems, it is possible to both ensure the continued sustainability of your control system and modernize it without requiring any changes to your existing PLC programs or I/O wiring, without a lengthy shutdown, and at a low cost.

Improving Reliability in Ethernet Control Networks

[click here for PDF version of this content]

As control networks become ever-more reliant on Ethernet for communications, it is more important than ever to set-up, configure and operate your Ethernet networks properly to avoid the potentially serious problems that Ethernet traffic storms can cause. Although broadcast storms are the primary concern, multicast and unicast traffic can sometimes reach levels that can compromise control system performance. The most important step you can take to ensure the proper functioning of your network is to set it up the right way in the beginning. But even if you have configured your network properly from the start, there are other reasons that you might find yourself plagued by network traffic problems. Keep reading to understand the types of Ethernet traffic, the issues that too much of any type of traffic can cause, and some ways to mitigate against potential problems.

Types of Ethernet Traffic

To diagnose network reliability problems, it is important to understand the types of Ethernet packets that may be transported by the network. There are three types of Ethernet network traffic that are common to any Ethernet network and essential to its proper operation: unicast, multicast, and broadcast. 

Unicast traffic: Ethernet packets addressed directly to the MAC address of a specific device. 
Multicast traffic: Ethernet packets whose destination address is a multicast group address. Ethernet devices that want to process these packets are configured to listen on a particular group address. 
Broadcast traffic: Ethernet packets whose destination address is the broadcast address. All devices connected to the same Ethernet network receive broadcast packets.

While all three types of Ethernet traffic are common to any Ethernet network, excessive traffic of any kind can cause problems for an Ethernet network, particularly for an industrial control network which typically has more limited processing power relative to an office network and is especially sensitive to large volumes of message traffic.

Can There Be Too Much of a Good Thing?

Excessive Ethernet traffic is often referred to as a “traffic storm.” The most common type of traffic storm is a broadcast storm, but both multicast and unicast traffic can cause issues for your control network as well.

Several modern Ethernet I/O protocols are based on multicast messages, and when a network contains large numbers of devices using multicast protocols, network loading can increase substantially. In situations where there are multiple SCADA systems polling for large amounts of data, even unicast message traffic can sometimes reach a level where problems can occur.

The following sections will discuss the different types of network traffic storms and ways to prevent and mitigate against each type. In order to determine which type of traffic (and which device) is causing your traffic storm, it is usually necessary to utilize a protocol analyzer or “packet sniffer” to capture and analyze network traffic. We at CTI prefer to use Wireshark, a free, open-source network protocol analyzer. While this paper will not attempt to provide further information on using Wireshark or any other network protocol analyzer, there is a wealth of information available on the Internet. One article we like is by Brian Hill at www.arstechnica.com/informationtechnology/2016/09/the-power-of-protocol-analyzers/.

Unicast Traffic: How Can High Levels Affect the Control System?

The typical Ethernet-enabled PLC sees several sources of Unicast traffic, including: 

SCADA systems polling the PLC for data 
HMI panels polling the PLC for data 
Other PLCs polling for data 
PCs performing programming operations

Each Ethernet packet received by the PLC causes an “interrupt,” requiring processor resources to remove the packet from the receive buffer and save it for processing later during the communications task. When high unicast packet rates occur, the PLC can spend considerable time in this “interrupt” processing. If the packet rate becomes excessive, larger and larger amounts of processing time are consumed, resulting in degradation of process control tasks and dropped Ethernet packets.

Ethernet switches learn the MAC address of devices communicating with each switch port. Once the MAC address is known, the switch forwards unicast packets with a given unicast destination only to the corresponding port. The network interface of an Ethernet device blocks reception of all unicast packets except for those whose destination address equals the MAC address of the interface. For these reasons, excessive loading due to unicast packets is less likely. Nevertheless, it is still possible under certain conditions, including the following: 

A large number of devices, such as SCADA/HMI systems, that are rapidly polling for data. While the average packet rate is often acceptable, traffic from these systems tends to be in bursts - a flurry of packets followed by a period with no packets. As the number of devices increases, bursts from multiple devices overlap, causing large traffic peaks. 
Misconfigured devices erroneously sending to the IP address of the device. 
Denial of Service (DoS) attacks, where a flurry of packets are sent to the unicast address in order to degrade operation.

Mitigating Unicast Traffic Problems

Depending on your situation, the following actions can resolve unicast traffic problems: 

Reduce the polling rate of the SCADA/HMI systems, if possible. Most of these systems poll more rapidly than required (twice as fast as the required update time). 
Locate and reconfigure/disable any offending devices. 
Connect the device to a switch capable of rate limiting. Configure the switch to limit packet forwarding to an acceptable packet rate. Because rate limiting can buffer Ethernet packets, it can be used to level out traffic peaks. 
Use a CTI communications module, such as a 2572-B or 2500P-ECC1 for network communications.

Multicast Traffic: A Commonly Overlooked Cause for Control Network Issues

Properly implemented, multicast messaging is an efficient communications method for your controls system when the same data needs to be transmitted to multiple recipients. EtherNet/IP often uses multicast for communications between the PLC and I/O devices.

However, in certain circumstances, multicast traffic can create problems. Multicast traffic tends to have higher packet rates than unicast traffic because there is no requirement to wait for a device to reply.

By default, Ethernet switches flood multicast packets to all switch ports, thus propagating these packets throughout the Ethernet network. Combined with high rates of multicast traffic (for example with EtherNet/IP I/O), this behavior can adversely affect the operation of network devices.

Most devices always allow some multicast traffic to pass through the Ethernet interface. This traffic includes packets with multicast addresses related to network control and multicast group management. Consequently, it is always possible for multicast traffic to cause interrupts on the device.

Mitigating Multicast Network Problems

Multicast traffic will not present a problem for the CTI 2500 -Cxxx Processor: it does not support multicast and is configured to block reception of all multicast messages. To prevent unwanted multicast traffic from flooding other CTI products, you have several solutions. If you do not wish to use your CTI Ethernet-enabled products for multicast communications at all, you can connect them to a managed switch that supports IGMP (Internet Group Management Protocol) snooping (almost all do). This feature detects IGMP requests to join a particular multicast group and forwards the associated multicast stream only to the port connected to the requesting device. Most switches that support IGMP snooping can be configured to discard multicast streams that are unknown to the switch. If you want your CTI product to receive a multicast stream, however, this solution will not work properly, since it requires the CTI product to respond to an IGMP query from a router. Current CTI products will not respond to an IGMP query as they were designed for multicast communications on the local network only.

An alternate solution when you want your CTI product to receive multicast messages is to use a managed Ethernet switch that supports Bridge Multicast Filtering. This feature allows you to statically define how multicast packets are forwarded. If you want to receive multicast packets with a particular group address, you can statically assign the group address to the port. You can also choose to block all multicast packets from being forwarded to designated ports.

If you don’t want to add an external switch, another possibility is to use a CTI product such as the 2500PECC1 and/or 2500P-ACP1 that employ embedded switches which limit the rate of multicast (and broadcast) packets. While this solution is often effective, there is some risk of missing packets that you want to receive, since the limiting algorithm begins discarding packets after the maximum threshold is exceeded.

A more global solution is to segment your Ethernet network, as discussed in the following sections.

Broadcast Traffic: The Most Common Culprit for Network Disruptions

Broadcast traffic is required for proper operation of TCP/IP over Ethernet. For example, the Address Resolution Protocol (ARP), which discovers the MAC address of a device with a known IP address, is required in order to transmit TCP/IP unicast messages via an Ethernet link. As noted earlier, all devices on an Ethernet network must consume resources to process the broadcast packet. As more and more devices are added to the network, the number of broadcasts naturally increase.

Broadcast storms occur when an abnormally high number of broadcast messages are sent within a short period of time, overwhelming devices on the network and oftentimes causing congestion and dropped packets in the network switches.

While broadcast storms may simply be a nuisance in an office network, they can be catastrophic in a control network. Because of size, cost, and power constraints, devices in a control network typically have limited processing power relative to an office computer. In addition, those limited resources must remain dedicated to the primary control task to ensure proper equipment operation.

What Causes a Broadcast Storm?

While there can be many factors that contribute to a broadcast storm, the most typical causes are the following:

Combining the plant floor network with the IT network Information Technology (IT) networks often generate a lot of broadcast traffic. While this level of broadcast is acceptable for the IT network, it can seriously degrade the performance of control systems, which require real time operation.
Excessively large control networks. Even when isolated from the IT network, large control networks themselves can generate too much broadcast traffic due to the number of devices attached to the network and the protocols employed.
Poorly designed control protocols. It is not unusual to find rogue Ethernet communications and I/O protocols that use broadcast as the primary means of delivering data. Typically these are legacy protocols that were developed in the early stages of Ethernet adoption.
Hardware Failure/Defective Switches. A defective switch, router or computer network interface can flood the network with broadcast traffic. It is worth it to invest in quality networking equipment that is equipped with storm prevention features.
Human Error. A common human error is when a loop is inadvertently created causing broadcast traffic to continually repeat through the network. A loop can be created either by connecting both ends of a cable into two ports of the same switch or by creating a loop among several switches.

Preventing or Mitigating Broadcast Storms

There are several ways to reduce the occurrence of storms and/or to mitigate network disruptions caused by a storm.

1. Isolate the IT Network from the Control Network
Ensure that your corporate IT network is either physically isolated from your controls network or is separated from it by properly configured routers and firewalls. Routers forward unicast and designated multicast TCP/IP packets but do not forward broadcast traffic. Firewalls further limit the TCP/IP packets that will be accepted. These network components allow communications between the administrative offices and plant floor while preventing broadcast and undesirable unicast/multicast packets from entering the control network. Why is this plaguing me

2. Subdivide your Plant floor network into smaller segments
One of the best ways to tame broadcast and multicast traffic is to subdivide the network into smaller broadcast domains. Most plant floor environments can naturally be divided into work areas that are automated by one or more controllers and associated I/O. Each broadcast domain would consist of an isolated Ethernet Local Area Network (LAN), connected to the larger plant-wide network via a Layer 3 switch. The Layer 3 switch performs the function of routing TCP/IP packets between the LAN and the main network while also providing Ethernet switching functions (Layer 2). Consequently, broadcast packets from the larger network are not propagated to the LAN. (see Figure 1 below).

3. Implement Broadcast Storm Control
One of the easiest and best ways to minimize the effects of storms and high packet rates is by installing a managed switch. All managed switches have the ability to set rate limits for broadcast traffic. Many others also enable rate limiting of multicast and unicast traffic. As an example, the illustration below (Figure 2) shows the Storm Control configuration web page for a Cisco 8-port managed switch. This switch allows setting of independent packet rate limits for broadcast, multicast, and unicast traffic. The rate limits are set as a % of the total network bandwidth or as packets per second (pps). For ports connected to CTI products, we recommend the port to be limited to 1000 broadcast/multicast packets per second (1% of a 100Mb link).

4. Implement Switch Loopback Protection
Many modern Ethernet switches provide a loopback detection feature, which prevents the inadvertent creation of a loop. The feature operates by periodically transmitting loop protocol packets out of a port that is enabled for Loopback Detection. If the same packet is subsequently received by the port, the port is automatically disabled, stopping packet loop propagation. Many of today’s switches support Rapid Spanning Tree Protocol (RSTP) to keep the network topology loop-free when employing redundant networks. RSTP, loops and network redundancy are advanced topics not covered by this Tech Tip.

5. Training
To reduce human error, it is important to properly train all personnel who will be interacting with the network. Having a basic knowledge of network operation, including an understanding of the types of traffic storms and how to prevent/mitigate against them, can prevent errors, aid in recognizing typical problems, and improve the capability to accurately report anomalies to support personnel. Having technicians that can use a network capture program, such as Wireshark, to record network events can vastly improve the ability of CTI customer support personnel to assist with the resolution of a communications problem.

Conclusion

Ethernet is continuing to revolutionize industrial control systems. As manufacturers explore the “Industrial Internet of Things” (IIoT) and implement more and more Ethernetconnected devices, it is ever more important to understand and manage the risk of Ethernet traffic storms. This Tech Tip provided a high-level view of the risks of traffic storms and some steps you can take to prevent or mitigate against them.

Please contact us if you need additional assistance.

Contact

Login

Register

ENSURING SYSTEM RELIABILITY AND SUSTAINABILITY

Overview

Ensuring Control System Sustainability

Improving Reliability in Ethernet Control Networks