Beyond single-model AI: How architectural design drives reliable multi-agent orchestration

Stay Ahead, Stay ONMINE

Beyond single-model AI: How architectural design drives reliable multi-agent orchestration

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More We’re seeing AI evolve fast. It’s no longer just about building a single, super-smart model. The real power, and the exciting frontier, lies in getting multiple specialized AI agents to work together. Think of them as […]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

We’re seeing AI evolve fast. It’s no longer just about building a single, super-smart model. The real power, and the exciting frontier, lies in getting multiple specialized AI agents to work together. Think of them as a team of expert colleagues, each with their own skills — one analyzes data, another interacts with customers, a third manages logistics, and so on. Getting this team to collaborate seamlessly, as envisioned by various industry discussions and enabled by modern platforms, is where the magic happens.

But let’s be real: Coordinating a bunch of independent, sometimes quirky, AI agents is hard. It’s not just building cool individual agents; it’s the messy middle bit — the orchestration — that can make or break the system. When you have agents that are relying on each other, acting asynchronously and potentially failing independently, you’re not just building software; you’re conducting a complex orchestra. This is where solid architectural blueprints come in. We need patterns designed for reliability and scale right from the start.

The knotty problem of agent collaboration

Why is orchestrating multi-agent systems such a challenge? Well, for starters:

They’re independent: Unlike functions being called in a program, agents often have their own internal loops, goals and states. They don’t just wait patiently for instructions.
Communication gets complicated: It’s not just Agent A talking to Agent B. Agent A might broadcast info Agent C and D care about, while Agent B is waiting for a signal from E before telling F something.
They need to have a shared brain (state): How do they all agree on the “truth” of what’s happening? If Agent A updates a record, how does Agent B know about it reliably and quickly? Stale or conflicting information is a killer.
Failure is inevitable: An agent crashes. A message gets lost. An external service call times out. When one part of the system falls over, you don’t want the whole thing grinding to a halt or, worse, doing the wrong thing.
Consistency can be difficult: How do you ensure that a complex, multi-step process involving several agents actually reaches a valid final state? This isn’t easy when operations are distributed and asynchronous.

Simply put, the combinatorial complexity explodes as you add more agents and interactions. Without a solid plan, debugging becomes a nightmare, and the system feels fragile.

Picking your orchestration playbook

How you decide agents coordinate their work is perhaps the most fundamental architectural choice. Here are a few frameworks:

The conductor (hierarchical): This is like a traditional symphony orchestra. You have a main orchestrator (the conductor) that dictates the flow, tells specific agents (musicians) when to perform their piece, and brings it all together.
- This allows for: Clear workflows, execution that is easy to trace, straightforward control; it is simpler for smaller or less dynamic systems.
- Watch out for: The conductor can become a bottleneck or a single point of failure. This scenario is less flexible if you need agents to react dynamically or work without constant oversight.
The jazz ensemble (federated/decentralized): Here, agents coordinate more directly with each other based on shared signals or rules, much like musicians in a jazz band improvising based on cues from each other and a common theme. There might be shared resources or event streams, but no central boss micro-managing every note.
- This allows for: Resilience (if one musician stops, the others can often continue), scalability, adaptability to changing conditions, more emergent behaviors.
- What to consider: It can be harder to understand the overall flow, debugging is tricky (“Why did that agent do that then?”) and ensuring global consistency requires careful design.

Many real-world multi-agent systems (MAS) end up being a hybrid — perhaps a high-level orchestrator sets the stage; then groups of agents within that structure coordinate decentrally.

Managing the collective brain (shared state) of AI agents

For agents to collaborate effectively, they often need a shared view of the world, or at least the parts relevant to their task. This could be the current status of a customer order, a shared knowledge base of product information or the collective progress towards a goal. Keeping this “collective brain” consistent and accessible across distributed agents is tough.

Architectural patterns we lean on:

The central library (centralized knowledge base): A single, authoritative place (like a database or a dedicated knowledge service) where all shared information lives. Agents check books out (read) and return them (write).
- Pro: Single source of truth, easier to enforce consistency.
- Con: Can get hammered with requests, potentially slowing things down or becoming a choke point. Must be seriously robust and scalable.
Distributed notes (distributed cache): Agents keep local copies of frequently needed info for speed, backed by the central library.
- Pro: Faster reads.
- Con: How do you know if your copy is up-to-date? Cache invalidation and consistency become significant architectural puzzles.
Shouting updates (message passing): Instead of agents constantly asking the library, the library (or other agents) shouts out “Hey, this piece of info changed!” via messages. Agents listen for updates they care about and update their own notes.
- Pro: Agents are decoupled, which is good for event-driven patterns.
- Con: Ensuring everyone gets the message and handles it correctly adds complexity. What if a message is lost?

The right choice depends on how critical up-to-the-second consistency is, versus how much performance you need.

Building for when stuff goes wrong (error handling and recovery)

It’s not if an agent fails, it’s when. Your architecture needs to anticipate this.

Think about:

Watchdogs (supervision): This means having components whose job it is to simply watch other agents. If an agent goes quiet or starts acting weird, the watchdog can try restarting it or alerting the system.
Try again, but be smart (retries and idempotency): If an agent’s action fails, it should often just try again. But, this only works if the action is idempotent. That means doing it five times has the exact same result as doing it once (like setting a value, not incrementing it). If actions aren’t idempotent, retries can cause chaos.
Cleaning up messes (compensation): If Agent A did something successfully, but Agent B (a later step in the process) failed, you might need to “undo” Agent A’s work. Patterns like Sagas help coordinate these multi-step, compensable workflows.
Knowing where you were (workflow state): Keeping a persistent log of the overall process helps. If the system goes down mid-workflow, it can pick up from the last known good step rather than starting over.
Building firewalls (circuit breakers and bulkheads): These patterns prevent a failure in one agent or service from overloading or crashing others, containing the damage.

Making sure the job gets done right (consistent task execution)

Even with individual agent reliability, you need confidence that the entire collaborative task finishes correctly.

Consider:

Atomic-ish operations: While true ACID transactions are hard with distributed agents, you can design workflows to behave as close to atomically as possible using patterns like Sagas.
The unchanging logbook (event sourcing): Record every significant action and state change as an event in an immutable log. This gives you a perfect history, makes state reconstruction easy, and is great for auditing and debugging.
Agreeing on reality (consensus): For critical decisions, you might need agents to agree before proceeding. This can involve simple voting mechanisms or more complex distributed consensus algorithms if trust or coordination is particularly challenging.
Checking the work (validation): Build steps into your workflow to validate the output or state after an agent completes its task. If something looks wrong, trigger a reconciliation or correction process.

The best architecture needs the right foundation.

The post office (message queues/brokers like Kafka or RabbitMQ): This is absolutely essential for decoupling agents. They send messages to the queue; agents interested in those messages pick them up. This enables asynchronous communication, handles traffic spikes and is key for resilient distributed systems.
The shared filing cabinet (knowledge stores/databases): This is where your shared state lives. Choose the right type (relational, NoSQL, graph) based on your data structure and access patterns. This must be performant and highly available.
The X-ray machine (observability platforms): Logs, metrics, tracing – you need these. Debugging distributed systems is notoriously hard. Being able to see exactly what every agent was doing, when and how they were interacting is non-negotiable.
The directory (agent registry): How do agents find each other or discover the services they need? A central registry helps manage this complexity.
The playground (containerization and orchestration like Kubernetes): This is how you actually deploy, manage and scale all those individual agent instances reliably.

How do agents chat? (Communication protocol choices)

The way agents talk impacts everything from performance to how tightly coupled they are.

Your standard phone call (REST/HTTP): This is simple, works everywhere and good for basic request/response. But it can feel a bit chatty and can be less efficient for high volume or complex data structures.
The structured conference call (gRPC): This uses efficient data formats, supports different call types including streaming and is type-safe. It is great for performance but requires defining service contracts.
The bulletin board (message queues — protocols like AMQP, MQTT): Agents post messages to topics; other agents subscribe to topics they care about. This is asynchronous, highly scalable and completely decouples senders from receivers.
Direct line (RPC — less common): Agents call functions directly on other agents. This is fast, but creates very tight coupling — agent need to know exactly who they’re calling and where they are.

Choose the protocol that fits the interaction pattern. Is it a direct request? A broadcast event? A stream of data?

Putting it all together

Building reliable, scalable multi-agent systems isn’t about finding a magic bullet; it’s about making smart architectural choices based on your specific needs. Will you lean more hierarchical for control or federated for resilience? How will you manage that crucial shared state? What’s your plan for when (not if) an agent goes down? What infrastructure pieces are non-negotiable?

It’s complex, yes, but by focusing on these architectural blueprints — orchestrating interactions, managing shared knowledge, planning for failure, ensuring consistency and building on a solid infrastructure foundation — you can tame the complexity and build the robust, intelligent systems that will drive the next wave of enterprise AI.

Nikhil Gupta is the AI product management leader/staff product manager at Atlassian.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Six vendor platforms to watch

Most recently, Extreme (Nasdaq:EXTR) added an AI service agent to Platform ONE, as well as a new dashboard to simplify network and security operations. 3. Fortinet Security Platform: Integration is built-in The Fortinet Security Fabric features one operating system (FortiOS), a unified agent (FortiClient), one management console (FortiManager), one data

Open MPIC project defends against BGP attacks on certificate validation

Traditional validation methods rely on DNS lookups, HTTP challenges or email verification, all of which depend on proper internet routing. BGP’s inherent lack of security controls creates the opportunity for traffic hijacking. “When a CA performs a domain control check, it assumes the traffic it sends is reaching the right

VMware customers in Europe face up to 1,500% price increases under Broadcom ownership

Regulatory storm brewing The pricing crisis has triggered formal regulatory attention across Europe. Germany’s VOICE IT customer association has filed a complaint with the European Commission, while ECCO explicitly calls for regulatory intervention, including reinstating previous contracts and suspending Broadcom’s ongoing litigation. “Unless Broadcom promptly implements critical changes, the company’s

AWS no longer offering private 5G, cedes field to established industry players and carriers

While its 5G service is gone, the company is continuing to offer Integrated Private Wireless (IPW) on AWS, which connects customers with its partner communication service providers’ private 5G and 4G LTE networks. “This feels less like a retreat and more like a strategic realignment,” said Mike Leibovitz, Gartner research

DOE Allots Funding to Strengthen Puerto Rico Power Grid

The U.S. Department of Energy (DOE) has set aside $365 million for grid resilience projects in Puerto Rico. The funding, distributed through the Puerto Rico Resilience Fund (PR-ERF), will go toward practical fixes and emergency activities to alleviate the current crisis, which was reflected in a recent island-wide blackout, the DOE said in a press release. “By redirecting these funds, we will ensure taxpayer dollars are used to strengthen access to affordable, reliable, and secure power, benefiting more citizens as quickly as possible. This strategic shift allows us to address the root causes of the grid’s instability, strengthening the grid’s fragile infrastructure and delivering lasting relief for Puerto Rico”, Chris Wright, U.S. Secretary of Energy, said. “Puerto Rico is facing an energy emergency that requires we act now and deliver immediate solutions. Our communities, businesses, and healthcare facilities cannot afford to wait years, nor can we rely on piecemeal approaches with limited results. Rather than impacting a few customers, deploying these funds for urgent projects that improve the resiliency and reliability of our grid will have widespread, lasting benefits for all 3.2 million Americans in Puerto Rico”, Jenniffer González-Colón, Puerto Rico Governor, said. The Biden administration initially granted this $365 million funding in December 2024 to aid the installation of rooftop solar and battery storage, with construction set to commence in 2026. The DOE is shifting its priorities regarding these awards and will reallocate the funding to support technologies that enhance system flexibility and responsiveness, power flow and management, component durability, supply security, and safety, the agency said. To contact the author, email [email protected] WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed. MORE

Australia Pacific LNG Cuts Price in Massive Sinopec Supply Deal

Australia Pacific LNG agreed to cut the price of liquefied natural gas sold under a major contract with China’s Sinopec. The price review resulted in a reduction in the oil-linked contract slope from Jan. 1, 2025, Origin Energy Ltd. — which holds a 27.5% stake in the export project, said Friday. The Sydney-based company sees a reduction in its underlying earnings from the Australia Pacific LNG plant of A$55 million ($35 million) in the six months through June 2025. Outside of the US, most of the LNG sold under long-term contracts is linked to prices for crude. Origin in October said that Sinopec had sent a price review notice to APLNG. China, the world’s biggest buyer of LNG, is seeking lower prices as the market faces an oversupply in the second half of this decade due to more projects coming online. The nation has also been cutting back imports of the super-chilled fuel in the past few months because of softer domestic demand and strong pipeline gas imports. Sinopec’s LNG supply contract ends in December 2035, with one final price review in 2030 at APLNG’s discretion, Origin said. The Chinese company is also an APLNG shareholder — with a 25% stake — while ConocoPhillips holds the remaining 47.5%. The contract started in 2016, and was for 7.6 million tons a year, making it one of the largest globally, according to the International Group of LNG Importers. WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed. MORE FROM THIS AUTHOR Bloomberg

ICYMI: President Trump Signs Executive Orders to Usher in a Nuclear Renaissance, Restore Gold Standard Science

WASHINGTON—Today, President Trump signed several key executive orders to usher in a nuclear renaissance and restore America’s gold standard in science and innovation, directing the Department of Energy to take a leading role in unleashing the American nuclear renaissance. President Trump is taking decisive action to strengthen scientific discovery in America, rebuild public trust in science, and accelerate advanced nuclear technologies. After decades of stagnation and shuttered reactors, President Trump is providing a path forward for nuclear innovation. Today’s executive orders allow for reactor design testing at the Department of Energy’s (DOE) National Labs, clear the way for construction on federal lands to protect national and economic security, and remove regulatory barriers by requiring the Nuclear Regulatory Commission to issue timely licensing decisions. “For too long, America’s nuclear energy industry has been stymied by red tape and outdated government policies, but thanks to President Trump, the American nuclear renaissance is finally here,”Energy Secretary Chris Wright said. “With the emergence of AI and President Trump’s pro-American manufacturing policies at work, American civil nuclear energy is being unleashed at the perfect time. Nuclear has the potential to be America’s greatest source of energy addition. It works whether the wind is blowing, or the sun is shining, is possible anywhere and at different scales. President Trump’s executive orders today unshackle our civil nuclear energy industry and ensure it can meet this critical moment.” “Over the last 30 years, we stopped building nuclear reactors in America – that ends now. Today’s executive orders are the most significant nuclear regulatory reform actions taken in decades. We are restoring a strong American nuclear industrial base, rebuilding a secure and sovereign domestic nuclear fuel supply chain, and leading the world towards a future fueled by American nuclear energy. These actions are critical to American energy independence and

Energy Secretary Issues Emergency Order to Secure Grid Reliability Ahead of Summer Months

WASHINGTON— U.S. Secretary of Energy Chris Wright issued an emergency order today to minimize the risk of blackouts and address critical grid security issues in the Midwestern region of the United States ahead of the high electricity demand expected this summer. Secretary Wright’s order directs the Midcontinent Independent System Operator (MISO), in coordination with Consumers Energy, to ensure that the 1,560 megawatt (MW) J.H. Campbell coal-fired power plant in West Olive, Michigan remains available for operation, minimizing any potential capacity shortfall that could lead to unnecessary power outages. The Campbell Plant was scheduled to shut down on May 31, which is 15 years before the end of its scheduled design life. “Today’s emergency order ensures that Michiganders and the greater Midwest region do not lose critical power generation capability as summer begins and electricity demand regularly reach high levels,” Secretary Wright said. “This administration will not sit back and allow dangerous energy subtraction policies threaten the resiliency of our grid and raise electricity prices on American families. With President Trump’s leadership, the Energy Department is hard at work securing the American people access to affordable, reliable, and secure energy that powers their lives regardless of whether the wind is blowing, or the sun is shining.” The emergency order, which is issued by the Office of Cybersecurity, Energy Security, and Emergency Response (CESER), is authorized by Section 202(c) of the Federal Power Act and is in accordance with President Trump’s Executive Order: Declaring a National Energy Emergency. It will ensure the power generation availability in the region does not dip below 2024 capacity levels. BACKGROUND: Heading into the summer months, the North American Electric Reliability Corporation (NERC) has warned the region served by MISO “is at elevated risk of operating reserve shortfalls during periods of high demand,” particularly during the summer

WTI Settles at $61.53 in Light Trade

Oil drifted higher in thin pre-holiday trading as investors’ conviction that the US and Iran can reach a nuclear deal waned while strong US data buoyed a shaky demand picture. West Texas Intermediate edged up by 0.5% to settle above $61 a barrel, with volumes trending lower ahead of Monday’s Memorial Day holiday. The US and Iran concluded a fifth round of nuclear talks in Rome that yielded “some but not conclusive progress,” according to Iranian Foreign Minister Abbas Araghchi. A wrong turn in the negotiations, which have spurred criticism from several high-ranking Iranian officials, may lead to tighter sanctions, crimping flows from the OPEC member. Meanwhile, strong US economic data helped erase an earlier rout of nearly 2% after President Donald Trump said in a social media post that the European Union had been “very difficult to deal with” and that he would recommend a 50% tariff to be imposed on the bloc on June 1. The US dollar slumped to its lowest level since 2023, making commodities priced in the currency more attractive. Geopolitics have been a major focus for traders this week, with a report from CNN that US intelligence suggested Israel was making preparations to strike Iranian nuclear facilities driving brief gains earlier in the week. After that, Araghchi, Iran’s lead negotiator in talks with the US, said a deal was possible that would entail Tehran avoiding nuclear weapons, but not ditching uranium enrichment. Still, the outlook remains overall bearish. Crude has shed about 14% this year, hitting the lowest since 2021 last month, as OPEC+ loosened supply curbs at a faster-than-expected pace, just as the US-led tariff war posed headwinds for demand. Prices had recovered some ground as trade tensions between the US and China eased, but data this week also showed another increase in

Poland Says Key Infrastructure at Risk After Baltic Sea Incident

Polish Prime Minister Donald Tusk warned on Thursday that the Baltic Sea is becoming “a new area of confrontation” with Russia, putting the country’s critical infrastructure increasingly at risk. His warning comes a day after Polish authorities said a sanctioned Russian ship was performing “suspicious maneuvers” near the power cable connecting Poland and Sweden. The tanker left for an unspecified Russian port after the Polish armed forces intervened, they said. The undersea power link was not damaged, but Poland is checking whether any explosive devices were planted, the prime minister said after meeting top navy commanders. The Baltic Sea has become a flashpoint in recent months after the detention of several vessels on suspicion of tearing up undersea telecommunications cables. Baltic nations have also increased scrutiny of unregistered tankers due to concerns about sanctioned Russian oil, saying that Moscow’s so-called ‘shadow fleet’ could lead to security breaches and environmental risks. Since Russia’s full-scale invasion of Ukraine there have been “too many incidents” in the Baltic Sea for Poland to take maritime security lightly, Tusk said at a meeting with Polish naval commanders in the coastal city of Gdynia on Thursday. The risks are keenly felt in Poland, which shares a border with Russia’s ally Belarus and exclave of Kaliningrad, home to a naval base at Baltiysk. On Wednesday, Russia declared that it would defend its vessels in the Baltic Sea, one of the world’s busiest shipping routes, by all legal means after briefly deploying a fighter jet as Estonia tried to halt an oil tanker in its economic zone. In recent years, Poland has expanded its energy infrastructure to wean itself off Russian supplies. It has constructed a gas link to Norway, a liquefied natural gas import terminal as well as expanded port capacities to handle growing flows of goods and military aid to neighboring Ukraine.

New Intel Xeon 6 CPUs unveiled; one powers rival Nvidia’s DGX B300

He added that his read is that “Intel recognizes that Nvidia is far and away the leader in the market for AI GPUs and is seeking to hitch itself to that wagon.” Roberts said, “basically, Intel, which has struggled tremendously and has turned over its CEO amidst a stock slide, needs to refocus to where it thinks it can win. That’s not competing directly with Nvidia but trying to use this partnership to re-secure its foothold in the data center and squeeze out rivals like AMD for the data center x86 market. In other words, I see this announcement as confirmation that Intel is looking to regroup, and pick fights it thinks it can win. “ He also predicted, “we can expect competition to heat up in this space as Intel takes on AMD’s Epyc lineup in a push to simplify and get back to basics.” Matt Kimball, vice president and principal analyst, who focuses on datacenter compute and storage at Moor Insights & Strategy, had a much different view about the announcement. The selection of the Intel sixth generation Xeon CPU, the 6776P, to support Nvidia’s DGX B300 is, he said, “important, as it validates Intel as a strong choice for the AI market. In the big picture, this isn’t about volumes or revenue, rather it’s about validating a strategy Intel has had for the last couple of generations — delivering accelerated performance across critical workloads.” Kimball said that, In particular, there are a “couple things that I would think helped make Xeon the chosen CPU.”

AWS clamping down on cloud capacity swapping; here’s what IT buyers need to know

As of June 1, AWS will no longer allow sub-account transfers or new commitments to be pooled and reallocated across customers. Barrow says the shift is happening because AWS is investing billions in new data centers to meet demand from AI and hyperscale workloads. “That infrastructure requires long-term planning and capital discipline,” he said. Phil Brunkard, executive counselor at Info-Tech Research Group UK, emphasized that AWS isn’t killing RIs or SPs, “it’s just closing a loophole.” “This stops MSPs from bulk‑buying a giant commitment, carving it up across dozens of tenants, and effectively reselling discounted EC2 hours,” he said. “Basically, AWS just tilted the field toward direct negotiations and cleaner billing.” What IT buyers should do now For enterprises that sourced discounted cloud resources through a broker or value-added reseller (VAR), the arbitrage window shuts, Brunkard noted. Enterprises should expect a “modest price bump” on steady‑state workloads and a “brief scramble” to unwind pooled commitments. If original discounts were broker‑sourced, “budget for a small uptick,” he said. On the other hand, companies that buy their own RIs or SPs, or negotiate volume deals through AWS’s Enterprise Discount Program (EDP), shouldn’t be impacted, he said. Nothing changes except that pricing is now baselined.

DriveNets extends AI networking fabric with multi-site capabilities for distributed GPU clusters

“We use the same physical architecture as anyone with top of rack and then leaf and spine switch,” Dudy Cohen, vice president of product marketing at DriveNets, told Network World. “But what happens between our top of rack, which is the switch that connects NICs (network interface cards) into the servers and the rest of the network is not based on Clos Ethernet architecture, rather on a very specific cell-based protocol. [It’s] the same protocol, by the way, that is used in the backplane of the chassis.” Cohen explained that any data packet that comes into an ingress switch from the NIC is cut into evenly sized cells, sprayed across the entire fabric and then reassembled on the other side. This approach distinguishes DriveNets from other solutions that might require specialized components such as Nvidia BlueField DPUs (data processing units) at the endpoints. “The fabric links between the top of rack and the spine are perfectly load balanced,” he said. “We do not use any hashing mechanism… and this is why we can contain all the congestion avoidance within the fabric and do not need any external assistance.” Multi-site implementation for distributed GPU clusters The multi-site capability allows organizations to overcome power constraints in a single data center by spreading GPU clusters across locations. This isn’t designed as a backup or failover mechanism. Lasser-Raab emphasized that it’s a single cluster in two locations that are up to 80 kilometers apart, which allows for connection to different power grids. The physical implementation typically uses high-bandwidth connections between sites. Cohen explained that there is either dark fiber or some DWDM (Dense Wavelength Division Multiplexing) fibre optic connectivity between the sites. Typically the connections are bundles of four 800 gigabit ethernet, acting as a single 3.2 terabit per second connection.

Intel eyes exit from NEX unit as focus shifts to core chip business

“That’s something we’re going to expand and build on,” Tan said, according to the report, pointing to Intel’s commanding 68% share of the PC chip market and 55% share in data centers. By contrast, the NEX unit — responsible for silicon and software that power telecom gear, 5G infrastructure, and edge computing — has struggled to deliver the kind of strategic advantage Intel needs. According to the report, Tan and his team view it as non-essential to Intel’s turnaround plans. The report described the telecom side of the business as increasingly disconnected from Intel’s long-term objectives, while also pointing to fierce competition from companies like Broadcom that dominate key portions of the networking silicon market and leave little room for Intel to gain a meaningful share. Financial weight, strategic doubts Despite generating $5.8 billion in revenue in 2024, the NEX business was folded into Intel’s broader Data Center and Client Computing groups earlier this year. The move was seen internally as a signal that NEX had lost its independent strategic relevance and also reflects Tan’s ruthless prioritization. To some in the industry, the review comes as little surprise. Over the past year, Intel has already shed non-core assets. In April, it sold a majority stake in Altera, its FPGA business, to private equity firm Silver Lake for $4.46 billion, shelving earlier plans for a public listing. This followed the 2022 spinoff of Mobileye, its autonomous driving arm. With a $19 billion loss in 2024 and revenue falling to $53.1 billion, the chipmaker also aims to streamline management, cut $10 billion in costs, and bet on AI chips and foundry services, competing with Nvidia, AMD, and TSMC.

Tariff uncertainty weighs on networking vendors

“Our guide assumes current tariffs and exemptions remain in place through the quarter. These include the following: China at 30%, partially offset by an exemption for semiconductors and certain electronic components; Mexico and Canada at 25% for the components and products that are not eligible for the current exemptions,” Cisco CFO Scott Herron told Wall Street analysts in the company’s quarterly earnings report on May 14. At this time, Cisco expects little impact from tariffs on steel and aluminum and retaliatory tariffs, Herron said. “We’ll continue to leverage our world-class supply chain team to help mitigate the impact,” he said, adding that “the flexibility and agility we have built into our operations over the last few years, the size and scale of our supply chain, provides us some unique advantages as we support our customers globally.” “Once the tariff scenario stabilizes, there [are] steps that we can take to mitigate it, as you’ve seen us do with China from the first Trump administration. And only after that would we consider price [increases],” Herron said. Similarly, Extreme Networks noted the changing tariff conditions during its earnings call on April 30. “The tariff situation is very dynamic, I think, as everybody knows and can appreciate, and it’s kind of hard to call. Yes, there was concern initially given the magnitude of tariffs,” said Extreme Networks CEO Ed Meyercord on the earnings call. “The larger question is, will all of the changes globally in trade and tariff policy have an impact on demand? And that’s hard to call at this point. And we’re going to hold as far as providing guidance or judgment on that until we have finality come July.” Financial news Meanwhile, AI is fueling high expectations and influencing investments in enterprise campus and data center environments.

Liquid cooling becoming essential as AI servers proliferate

“Facility water loops sometimes have good water quality, sometimes bad,” says My Troung, CTO at ZutaCore, a liquid cooling company. “Sometimes you have organics you don’t want to have inside the technical loop.” So there’s one set of pipes that goes around the data center, collecting the heat from the server racks, and another set of smaller pipes that lives inside individual racks or servers. “That inner loop is some sort of technical fluid, and the two loops exchange heat across a heat exchanger,” says Troung. The most common approach today, he says, is to use a single-phase liquid — one that stays in liquid form and never evaporates into a gas — such as water or propylene glycol. But it’s not the most efficient option. Evaporation is a great way to dissipate heat. That’s what our bodies do when we sweat. When water goes from a liquid to a gas it’s called a phase change, and it uses up energy and makes everything around it slightly cooler. Of course, few servers run hot enough to boil water — but they can boil other liquids. “Two phase is the most efficient cooling technology,” says Xianming (Simon) Dai, a professor at University of Texas at Dallas. And it might be here sooner than you think. In a keynote address in March at Nvidia GTC, Nvidia CEO Jensen Huang unveiled the Rubin Ultra NVL576, due in the second half of 2027 — with 600 kilowatts per rack. “With the 600 kilowatt racks that Nvidia is announcing, the industry will have to shift very soon from single-phase approaches to two-phase,” says ZutaCore’s Troung. Another highly-efficient cooling approach is immersion cooling. According to a Castrol survey released in March, 90% of 600 data center industry leaders say that they are considering switching to immersion

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE