
That embedded telemetry feeds adaptive tuning of Dynamic Load Balancing parameters, Data Center Quantized Congestion Notification (DCQCN) and failover logic without waiting for a threshold breach or a manual intervention.
The platform architecture is layered. At the lowest levels, agents react in microseconds to link-level events such as transceiver flaps, rerouting leaf-spine traffic in milliseconds. At higher layers, agents make more strategic decisions about flow placement across the cluster. At the cloud layer, a large language model-based agent surfaces correlated insights to operators in natural language, allowing them to ask questions about specific jobs or alert conditions and receive context-aware responses.
Karam argued that simply bolting an LLM onto an existing architecture does not deliver the same result. “If you ask it to do anything, it could hallucinate and bring down the network,” he said. “It doesn’t have any of the context or the data that’s required for this approach to be made safe.”
Aria also exposes an MCP server, allowing external systems such as job schedulers and LLM routers to query network state directly and integrate it into their own decision-making.
MFU and token efficiency as the target metrics
Traditional networking is often evaluated in terms of bandwidth and latency. Aria is centering its platform around two metrics: Model FLOPS Utilization (MFU) and token efficiency. MFU is defined as the ratio of achieved FLOPS per accelerator to the theoretical peak. In practice, Karam said, MFU for training workloads typically runs between 33% and 45%, and inference often comes in below 30%.
“The network has a major impact on the MFU, and therefore the token efficiency, because the network touches every aspect, every other component in your cluster,” Karam said.




















