Beware Ethernet flow control
After having been recently bitten by Ethernet's flow control mechanism, I decided to learn about this somewhat obscure but commonly used facet of modern networks. This post is a summary of what I discovered about it and its associated benefits and dangers.
What is flow control?
Ethernet flow control, or 802.3x, is a way for a network device to tell its immediate neighbor that it is overloaded with data, such as when a device is receiving data faster than it can process it. It allows for an overloaded device to send out a special Ethernet frame, called a pause frame, that asks the device on the other end of the wire to stop sending data temporarily. If the receiving device honors the pause frame then the sending device has time to catch up on the stack of received data that it hasn't had time to process yet.
There also exists an older method for flow control called "back pressure" that is used in half-duplex environments (i.e. non-switched Ethernet). It consists of the overloaded device "jamming" the medium temporarily until it has the ability to accept more data. I don't know much about half-duplex flow control, and thus I won't mention it again; everything here applies solely to full-duplex flow control via 802.3x. Also, TCP has a mechanism for performing its own flow control that is entirely different from Ethernet's flow control; I will not be fully explaining TCP's flow control method here, as it would merit a lengthy discussion itself.
Rules of the game
When thinking about Ethernet flow control, it is important to keep several things in mind:
- Flow control operates at a lower layer than TCP or IP, and thus is independent of them. Put another way, flow control is capable of being used regardless of what higher-level protocols are put on top of it. An important side-effect of this is that neither TCP nor IP know what Ethernet's flow control is doing; they operate under the assumption that there is no flow control other than what they may or may not provide themselves.
- Flow control functions between two directly connected network devices, and flow control frames are never forwarded between links. Thus, two computers that are connected via a switch will never send pause frames to each other, but could send pause frames to the switch itself (and vice versa: the switch can send pause frames to the two computers).
- Pause frames have a limited duration; they will automatically "expire" after a certain amount of time. The expiration time is set by the device that transmits the pause frame.
- A paused link is not a discriminator of protocols; it will prevent any data from being passed across the link other than more pause frames.
TCP breakage
Okay, it isn't true, TCP doesn't stop working when flow control is enabled. However, an important part of it does stop working correctly: its own flow control mechanism. TCP flow control uses a more complex mechanism of timeouts and acknowledgement segments to determine when a remote device is overloaded. It basically sends at a faster and faster pace until it sees that some of its sent data isn't getting to the remote device and then slows down. This allows TCP to utilize network links in a somewhat intelligent manner, as an overloaded network or device will cause some TCP segments to be lost and thus cause the sender to send data at a slower rate.
Now consider what happens when Ethernet flow control is mixed with TCP flow control. Let's assume that we have two directly connected computers, one of which is much slower than the other. The faster sending computer starts sending lots of data to the slower receiving computer. The receiver eventually notices that it is getting overloaded with data and sends a pause frame to the sender. The sender sees the pause frame and stops sending temporarily. Once the pause frame expires, the sender will resume sending its flood of data to the other computer. Unfortunately, the TCP engine on the sender will not recognize that the receiver is overloaded, as there was no lost data -- the receiver will typically stop the sender before it loses any data. Thus, the sender will continue to speed up at an exponential rate; because it didn't see any lost data, it will send data twice as fast as before! Because the receiver has a permanent speed disadvantage, this will require the receiver to send out pause frames twice as often. Things start snowballing until the receiver pauses the sender so often that the sender starts dropping its own data before it sends it, and thus finally sees some data being lost and slows down.
Is this a problem? In some ways it isn't. Because TCP is a reliable protocol, nothing is ever really "lost"; it is simply retransmitted and life goes on. Ethernet flow control accomplishes the same thing as TCP flow control in this situation, as they both slow down the data transmission to the speed that the slower device can handle. There are some arguments to be made for there being an awkward overlap between the two flow control mechanisms, but it could be worse.
Unfortunately, it does get worse.
Head-of-line blocking
In the last example, I considered the case where two computers were directly connected to each other. This example is too simplistic to be of much use -- when was the last time you saw two directly connected computers? It is a bit of a rarity. Let's now look at what happens when you introduce a switch into the mix. For our purposes, let us assume that the switch fully supports Ethernet flow control and that it is willing to use it. Our new setup will consist of two desktop computers and one file server, all of which are attached to the switch. It isn't any fun to make everything perfect, so let's also say that one of the desktops has a 10 Mbps connection to the switch while the other desktop and the server have 100 Mbps connections.
This setup is usually fine -- the 10 Mbps connection will be slower than the others, but it doesn't cause too many problems, just slower service to the one desktop. Things could get ugly, though, if Ethernet flow control is enabled on the switch. Imagine that the 10 Mbps desktop requests a large file from the file server. The file server begins to send the file to the desktop initially at a slow rate, but quickly picks up steam. Eventually, the file server will start to send data to the desktop at 11 Mbps, which is more than the poor 10 Mbps connection can handle. Without flow control enabled on the switch, the switch would start to simply drop data segments destined to the desktop, which the file server would notice and start to throttle back its sending rate.
With flow control enabled on the switch, though, the switch takes a very different approach; it will send out its own pause frames to any port that is sending data to the now-overloaded 10 Mbps port. This means that the file server will receive a pause frame from the switch, requesting it to cease all transmissions for a certain amount of time. Is this a problem? Yes! Because pause frames cease all transmissions on the link, any other data that the file server is sending will be paused as well, including data that may be destined to the 100 Mbps desktop computer. Eventually the pause will expire and the file server will continue sending out data. Unfortunately, the TCP mechanism on the file server will not know that anything is wrong and will continue sending out data at faster and faster speeds, thus overloading the 10 Mbps desktop again. As before, the cycle will keep repeating itself until the file server starts dropping its own data. Unlike the previous situation, the innocent 100 Mbps desktop bystander is penalized and will see its transfers from the file server drop to 10 Mbps speeds.
This situation is called head-of-line blocking, and it is the major reason why Ethernet flow control is somewhat dangerous to use. When enabled on network switches, it can create situations where one slow link in a network can bring the rest of the network to a crawl. It gets especially bad if the backbones in your network have flow control enabled; it should be obvious by this point just how bad that could get.
When to enable flow control
So what should you do? Should you completely disable flow control on all computers and switches? Not necessarily. It is generally safe to leave flow control enabled on computers. Switches, though, should either have flow control disabled or configured such that they will honor received pause frames but will never send out new pause frames. Some Cisco switches are even permanently configured this way -- they can receive pause frames but never emit them. To be honest, the complete answer to flow control is somewhat more complicated than this (e.g. you could probably enable pause frame emission if a switch port is connected to a slow backplane), but the safest bet is to disable flow control when given the option.