Higher capacity throughout the network means less congestion. It’s old-think, they say, to assume that if you have faster LAN connections to users and servers, you’ll admit more traffic and congest trunks. “Applications determine traffic,” one CIO pointed out. “The network doesn’t suck data into it at the interface. Applications push it.” Faster connections mean less congestion, which means fewer complaints, and more alternate paths to take without traffic delay and loss, which also reduces complaints. In fact, anything that creates packet loss, outages, even latency, creates complaints, and addressing complaints is a big source of opex. The complexity comes in because network speed impacts user/application quality of experience in multiple ways, ways beyond the obvious congestion impacts.
When a data packet passes through a switch or router, it’s exposed to two things that can delay it. Congestion is one, but the other is “serialization delay.” This complex-sounding term means that you can’t switch a packet if you don’t have it all, and so every data packet is delayed until it’s all received. The length of that delay is determined by the speed of the connection it arrives on, so fast interfaces always offer better latency, and the delay a given packet experiences is the sum of the serialization delay of each interface it passes through.
Application designs, component costs and AI reshape views on network capacity
You might wonder why enterprises are starting to look at this capacity-solves-problems point now, versus years or decades earlier. They say there’s both a demand and supply-side answer.
On the demand side, increased componentization of applications, including the division of component hosting between data center and cloud, has radically increased the complexity of application workflows. Monolithic applications have simple workflows—input, process, output. Componentized ones have to move messages among the components, and each of these movements is supported by network connectivity, so the network is more tightly bound to application availability and performance. Not only that, the complex workflows make it harder to decide what’s wrong and how to fix it. Finally, remember serialization delay? Every component interface adds to it and eats up part of the delay budget intrinsic to all applications.
On the supply side, the cost of network adapters on systems and interfaces on network devices doesn’t increase in a linear way. One network engineer pointed out that the cost per bit of an interface typically falls as speed increases, up to a point, and then starts to rise. Where that curve breaks upward has changed as technologies have improved, so building in extra capacity is more practical today. Ethernet standards have also evolved to better handle multiple paths between switches (this capability is popular with enterprises that favor adding capacity to reduce opex) and different traffic priorities.
Then there’s AI. Interestingly, the majority of the enterprises who are now actively building local networks with bandwidth to burn are also early explorers of in-house hosting of AI. AI in general, and model training in particular, generates a lot of server-to-server traffic, and so congestion and the risk of delay or packet loss is high. Most agree that AI will need lower latency and higher network capacity, particularly during training, and that since the amount and nature of traffic generated by AI is impossible for a user of AI to understand, congestion-related issues would generate all the more complaint calls. AI traffic might also impact other applications. Thus, AI hosting is a good reason to think seriously about adding capacity to the data center network.