
Broadcom
Support for minimum packet size allows streaming of those packets at full bandwidth. That capability is essential for efficient communication in scientific and computational workloads. It is particularly important for scale-up networks where GPU-to-switch-to-GPU communication happens in a single hop.
Lossless Ethernet gets an ‘Ultra’ boost
Another specific area of optimization for the Tomahawk Ultra is with lossless Ethernet. Broadcom has integrated support for a pair of capabilities that were first fully defined in the Ultra Ethernet Consortium’s (UEC) 1.0 specification in June.
The lossless Ethernet support is enabled via:
- Link Layer Retry (LLR): With this approach, the chip automatically detects transmission errors using Forward Error Correction (FEC) and requests retransmission. Del Vecchio explained that when errors exceed FEC capabilities, with LLR at the link layer, the switch can now request a retry of that packet and it gets retransmitted.
- Credit-Based Flow Control (CBFC): CBFC prevents packet drops due to buffer overflow. If the receiver doesn’t have any space to receive a packet, the switch will send a pause signal to the sender, Del Vecchio said. Then once there’s space available, it will send a notification that a certain number of packets can be sent.
In-network collectives (INC) reduce network operations
The Tomahawk Ultra also helps to accelerate the overall speed of HPC and AI operations through something known as in-network collectives (INC).
In-network collectives are operations where multiple compute units like GPUs need to share and combine their computational results. For example, in an “all reduce” operation, GPUs computing different parts of a problem need to average their results across the network. With Tomahawk Ultra, instead of GPUs sending data back and forth and performing computations separately, the switch itself has hardware that can reduce the number of operations. The INC capability can receive data from all GPUs, perform computational operations like averaging directly in the network and then propagate the final result back to all GPUs.
The benefits are twofold. “You’ve offloaded some computation to the network,” Del Vecchio explained. “More importantly, you’ve significantly reduced the bandwidth the data transfers in the network.”