“When you need to get out of that rack and you need to connect multiple of these racks together, you need to scale out. This is where this NIC gets used,” Hassan explained.
The NIC ships in two SerDes configurations. The 100G version provides eight 100G lanes. The 200G version offers four 200G lanes. Both deliver 800G aggregate bandwidth through 16 lanes of PCIe Gen 6. The dual configuration strategy accommodates both current 100G ecosystems and emerging 200G deployments.

Broadcom
Breaking RDMA’s architectural constraints
Traditional RDMA protocols carry design limitations from their origins two to three decades ago. They lack multipathing support, cannot handle out-of-order packet delivery and rely on Go-Back-N retransmission. Under Go-Back-N, a single dropped packet forces retransmission of that packet plus every subsequent packet in the sequence.
These limitations become critical at scale. Network congestion increases packet loss. Go-Back-N amplifies the problem by flooding already-congested links with redundant retransmissions. Thor Ultra implements four architectural changes to break these constraints.
- Packet-level multipathing. The NIC divides its eight 100G lanes into separate network planes. Packets from a single message can be distributed across all planes for load balancing. Standard RDMA requires all packets in a flow to traverse a single path, preventing this optimization.
- Out-of-order data placement. Thor Ultra writes packets directly to XPU memory as they arrive, regardless of sequence. The NIC does not buffer packets awaiting in-order delivery. Instead, it tracks packet state and places each into its correct memory location immediately.
- Selective acknowledgment and retransmission. Thor Ultra replaces Go-Back-N with selective acknowledgment. When packets 3 and 6 are missing from a sequence of 1 through 8, the NIC sends a SACK indicating exactly which packets arrived and which are missing. The sender retransmits only packets 3 and 6.
- Programmable congestion Control. The NIC implements a hardware pipeline that supports multiple congestion control algorithms. Two schemes are currently available: receiver-based congestion control (receivers send credits to senders) and sender-based approaches (senders calculate round-trip time to determine transmission rates). The programmable pipeline can accommodate future UEC specification revisions or custom hyperscaler algorithms.
Performance and power
Thor Ultra consumes approximately 50 watts. This compares to 125-150W for products like Nvidia’s BlueField 3 DPU. The power difference stems from architectural choices rather than process technology.
DPUs target multiple use cases including front-end networking (requiring deep packet inspection and encryption), storage offload and security functions. They incorporate ARM cores, large memory subsystems and extensive acceleration engines. Thor Ultra strips out everything not required for AI backend networking.