New 1.5B router model achieves 93% accuracy without costly retraining

Stay Ahead, Stay ONMINE

New 1.5B router model achieves 93% accuracy without costly retraining

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Researchers at Katanemo Labs have introduced Arch-Router, a new routing model and framework designed to intelligently map user queries to the most suitable large language model (LLM). For enterprises […]

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Researchers at Katanemo Labs have introduced Arch-Router, a new routing model and framework designed to intelligently map user queries to the most suitable large language model (LLM).

For enterprises building products that rely on multiple LLMs, Arch-Router aims to solve a key challenge: how to direct queries to the best model for the job without relying on rigid logic or costly retraining every time something changes.

The challenges of LLM routing

As the number of LLMs grows, developers are moving from single-model setups to multi-model systems that use the unique strengths of each model for specific tasks (e.g., code generation, text summarization, or image editing).

LLM routing has emerged as a key technique for building and deploying these systems, acting as a traffic controller that directs each user query to the most appropriate model.

Existing routing methods generally fall into two categories: “task-based routing,” where queries are routed based on predefined tasks, and “performance-based routing,” which seeks an optimal balance between cost and performance.

However, task-based routing struggles with unclear or shifting user intentions, particularly in multi-turn conversations. Performance-based routing, on the other hand, rigidly prioritizes benchmark scores, often neglects real-world user preferences and adapts poorly to new models unless it undergoes costly fine-tuning.

More fundamentally, as the Katanemo Labs researchers note in their paper, “existing routing approaches have limitations in real-world use. They typically optimize for benchmark performance while neglecting human preferences driven by subjective evaluation criteria.”

The researchers highlight the need for routing systems that “align with subjective human preferences, offer more transparency, and remain easily adaptable as models and use cases evolve.”

A new framework for preference-aligned routing

To address these limitations, the researchers propose a “preference-aligned routing” framework that matches queries to routing policies based on user-defined preferences.

In this framework, users define their routing policies in natural language using a “Domain-Action Taxonomy.” This is a two-level hierarchy that reflects how people naturally describe tasks, starting with a general topic (the Domain, such as “legal” or “finance”) and narrowing to a specific task (the Action, such as “summarization” or “code generation”).

Each of these policies is then linked to a preferred model, allowing developers to make routing decisions based on real-world needs rather than just benchmark scores. As the paper states, “This taxonomy serves as a mental model to help users define clear and structured routing policies.”

The routing process happens in two stages. First, a preference-aligned router model takes the user query and the full set of policies and selects the most appropriate policy. Second, a mapping function connects that selected policy to its designated LLM.

Because the model selection logic is separated from the policy, models can be added, removed, or swapped simply by editing the routing policies, without any need to retrain or modify the router itself. This decoupling provides the flexibility required for practical deployments, where models and use cases are constantly evolving.

Preference-aligned routing framework (source: arXiv) — *Preference-aligned routing framework Source: arXiv*

The policy selection is powered by Arch-Router, a compact 1.5B parameter language model fine-tuned for preference-aligned routing. Arch-Router receives the user query and the complete set of policy descriptions within its prompt. It then generates the identifier of the best-matching policy.

Since the policies are part of the input, the system can adapt to new or modified routes at inference time through in-context learning and without retraining. This generative approach allows Arch-Router to use its pre-trained knowledge to understand the semantics of both the query and the policies, and to process the entire conversation history at once.

A common concern with including extensive policies in a prompt is the potential for increased latency. However, the researchers designed Arch-Router to be highly efficient. “While the length of routing policies can get long, we can easily increase the context window of Arch-Router with minimal impact on latency,” explains Salman Paracha, co-author of the paper and Founder/CEO of Katanemo Labs. He notes that latency is primarily driven by the length of the output, and for Arch-Router, the output is simply the short name of a routing policy, like “image_editing” or “document_creation.”

Arch-Router in action

To build Arch-Router, the researchers fine-tuned a 1.5B parameter version of the Qwen 2.5 model on a curated dataset of 43,000 examples. They then tested its performance against state-of-the-art proprietary models from OpenAI, Anthropic and Google on four public datasets designed to evaluate conversational AI systems.

The results show that Arch-Router achieves the highest overall routing score of 93.17%, surpassing all other models, including top proprietary ones, by an average of 7.71%. The model’s advantage grew with longer conversations, demonstrating its strong ability to track context over multiple turns.

Arch-Router vs other models (source: arXiv) — *Arch-Router vs other models Source: arXiv*

In practice, this approach is already being applied in several scenarios, according to Paracha. For example, in open-source coding tools, developers use Arch-Router to direct different stages of their workflow, such as “code design,” “code understanding,” and “code generation,” to the LLMs best suited for each task. Similarly, enterprises can route document creation requests to a model like Claude 3.7 Sonnet while sending image editing tasks to Gemini 2.5 Pro.

The system is also ideal “for personal assistants in various domains, where users have a diversity of tasks from text summarization to factoid queries,” Paracha said, adding that “in those cases, Arch-Router can help developers unify and improve the overall user experience.”

This framework is integrated with Arch, Katanemo Labs’ AI-native proxy server for agents, which allows developers to implement sophisticated traffic-shaping rules. For instance, when integrating a new LLM, a team can send a small portion of traffic for a specific routing policy to the new model, verify its performance with internal metrics, and then fully transition traffic with confidence. The company is also working to integrate its tools with evaluation platforms to streamline this process for enterprise developers further.

Ultimately, the goal is to move beyond siloed AI implementations. “Arch-Router—and Arch more broadly—helps developers and enterprises move from fragmented LLM implementations to a unified, policy-driven system,” says Paracha. “In scenarios where user tasks are diverse, our framework helps turn that task and LLM fragmentation into a unified experience, making the final product feel seamless to the end user.”

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

NaaS security strategy – building a software-defined architecture

Besfore you implement NaaS security controls, it’s important that you conduct a detailed technical assessment of your current and target architectures. Comparing the core architectural differences for security across traditional and NaaS environments.

US lets China buy semiconductor design software again

The reversal marks a dramatic shift from the aggressive stance the Trump administration took in May, when it imposed sweeping restrictions on electronic design automation (EDA) software — the critical tools needed to design advanced semiconductors. A short-lived stoppage The restrictions had targeted what analysts called the “upstream” of chip

Hardcoded root credentials in Cisco Unified CM trigger max-severity alert

The affected products-Cisco Unified CM and Unified CM SME–are core components of enterprise telephony infrastructure, widely deployed across government agencies, financial institutions, and large corporations to manage voice, video, and messaging at scale. A flaw in these systems could allow attackers to compromise an organization’s communications, letting them log in

HPE finalizes Juniper acquisition, forms new AI-centric networking unit

“We have agreed with the DOJ to offer a license, through an auction, to specific aspects of Juniper Mist, which is just the AI operations part,” HPE CEO Antonio Neri explained during the press conference. The distinction is technically significant. Competitors will gain access to Mist’s anomaly detection and predictive failure

Great British Energy Gets Permanent CEO

A release posted on the UK government website on Monday announced that Dan McGrail has been appointed as the permanent Chief Executive Officer of Great British Energy. McGrail will be based in Scotland, working from the Aberdeen headquarters, on a permanent contract with Great British Energy, the release noted, highlighting that he took up the post of interim CEO in March on secondment from RenewableUK. “His appointment brings world class private sector experience to Great British Energy, with the former Chief Executive of RenewableUK and CEO of Siemens Engines now leading the UK’s publicly owned clean power revolution,” the UK government release stated. “Under his stewardship as interim CEO for the last four months, he has helped rapidly set up the company,” the release added. “This includes announcing GBP 1 billion ($1.36 billion) for Great British Energy to invest in clean energy supply chains such as electric cables and floating offshore wind platforms to ensure the clean energy revolution is built here in Britain,” it continued. In the release, McGrail said, “it is a privilege to take on the CEO role permanently and lead Great British Energy from our Aberdeen HQ at such a pivotal moment”. “We are already delivering for British people, with schools and hospitals set to benefit from cheaper energy bills,” he added. “We will now focus on scaling up as Britain’s publicly owned energy company, making strategic investments that drive forward the government’s clean power mission and give people a stake in clean energy,” he went on to state. UK Energy Secretary Ed Miliband said in the release, “Dan has been a visionary leader as Great British Energy’s interim CEO and will bring world class private sector experience to our publicly owned clean power company”. “Great British Energy is at the heart of our clean power mission and Plan for Change and is investing in clean

Iberdrola Approves Supplemental Dividend for 2024

Iberdrola SA will distribute EUR 0.409 ($0.48) per share as a supplementary dividend for 2024, raising total shareholder remuneration for last year’s results to EUR 0.645 gross per unit. The total 2024 distribution represents a 15.6 percent increase from the previous year, the Spanish multinational power utility said in an online statement. “Investors will have three options: to receive the amount corresponding to their supplementary dividend (EUR 0.409 gross per share) in cash; to sell their rights on the market; or to obtain new bonus shares from the group free of charge”, Iberdrola said. “These three options are not mutually exclusive, so shareholders can choose one of the alternatives or combine them”. Iberdrola had already paid an interim dividend of EUR 0.231 gross per share in January, followed by an “engagement dividend” of EUR 0.005 gross per share that the company pledged for reaching a quorum of 70 percent of its share capital at the meeting of shareholders last May. “Iberdrola is ahead of schedule in meeting its commitment to establish a dividend of between EUR 0.61 and EUR 0.66 per share in 2026”, it said. Iberdrola scheduled July 23 for the release of its results for the first half of 2025. For the first quarter (Q1) it had reported EUR 12.86 billion in revenue, up 1.5 percent from the same three-month period last year. However, net profit fell to EUR 2 billion, or EUR 0,302 per share – compared to EUR 2.76 billion for Q1 2025. Earnings before interest, taxes, depreciation and amortization (EBITDA) dropped from EUR 5.86 billion for Q1 2024 to EUR 4.64 billion for Q1 2025. “Excluding the capital gains from the divestment of thermal generation assets in the first quarter of 2024, net profit increased by 26 percent and EBITDA increased by 12 percent”, Iberdrola said

EnerMech Bags Triton FPSO Work

EnerMech Limited has secured a three-year contract with Dana Petroleum Limited for Offshore Shutdown Support Services for the Triton floating, production, storage, and offloading (FPSO) vessel. The fixed-term contract includes a couple of two-year extension options, EnerMech said in a media release. The Triton FPSO is located approximately 120 miles east of Aberdeen in the North Sea. EnerMech noted Dana Petroleum has operated the Triton FPSO since 2012, playing a key role in the region’s energy landscape since oil production began in 2000. Under the agreement, EnerMech will incorporate its own System Integrity Management (SIMPro) software. This technology offers complete lifecycle monitoring and real-time data, improving operational safety, compliance, and efficiency, according to EnerMech. “Supporting the Triton FPSO marks a significant milestone for our North Sea operations and aligns with our wider global growth strategy. Securing this long-term partnership with Dana Petroleum is a testament to our technical capabilities, innovation, and ability to deliver value across the full lifecycle of an asset. We work on FPSO assets around the world, providing expert pre-commissioning through to end-of-life services, and this agreement highlights our ongoing commitment to safe, efficient, and sustainable operations”, Charles Davison Jr., EnerMech CEO, said. “In this instance, we are delivering a tailored solution that enables full visibility of the Triton FPSO’s condition as it enters late-life operations. Combined with our proven methodologies and responsive support model, we’re proud to be helping extend asset life, minimize downtime, and ensure safe and reliable performance”, he said. “An advanced web-based management solution, SIMPro provides fast, high-quality, and easily accessible information, significantly improving the speed and accuracy of electronically generated work packs”, Frazer Thomson, SVP for Energy Solutions, added. “EnerMech has built a strong industry-recognized reputation for setting a high bar for fluid power service operations, and we look forward to continuing

Valaris Secures New Works for Two of Its Drillships

Offshore drilling services provider Valaris Limited has secured more work for two of its units. The company said in a media release it has secured a 940-day contract extension for drillship Valaris DS-16, starting in June 2026. It has also secured a new 914-day contract for drillship Valaris DS-18, expected to start in the mid-fourth quarter 2026, with Anadarko Petroleum Corp., a wholly-owned subsidiary of Occidental, in the Gulf of America. The combined addition to the contracted revenue backlog is approximately $760 million, Valaris said. “We’ve secured approximately $1.9 billion in new contract backlog so far this year, reflecting solid execution of our commercial strategy and our ability to deliver safe and efficient operations for our customers. We remain focused on securing additional attractive, long-term contracts for our high-specification assets that will further support our earnings and cash flow”, Anton Dibowitz, President and Chief Executive Officer, said. Earlier the company agreed to sell its jackup Valaris 247 to BW Energy for cash proceeds of approximately $108 million. This sale is expected to close in the second half of 2025, subject to customary closing conditions. As part of the sales agreement, BWE will be restricted from using the rig outside of BWE-owned or affiliated properties for the rig’s expected remaining useful life, Valaris said. At the time of the sale, Dibowitz said that the 27-year-old vessel was working offshore Australia. Valaris had also secured a five-well contract offshore West Africa for Valaris DS-15. That contract, according to the company, is expected to start in the third quarter of 2025 and run for 250 days. The contract is valued $130 million. To contact the author, email [email protected] What do you think? We’d love to hear from you, join the conversation on the Rigzone Energy Network. The Rigzone Energy Network is a new

Petronas Strengthens Partnerships for Sabah Oil and Gas Developments

Petroliam Nasional Berhad (Petronas) has formalized several key agreements with the Sabah government, focusing on the responsible development of the state’s energy resources. Petronas said in a media release the event marked a significant milestone in Sabah’s energy growth with the official handover of the Sabah Gas Strategy, a project led jointly by Petronas and the state government. Guided by the Sabah Joint Coordination Committee, the strategy was crafted by a Joint Task Force including Petronas, SMJ Energy Sdn. Bhd., Sabah Energy Corp. Sdn. Bhd., the Energy Commission of Sabah, and the Ministry of Industrial Development and Entrepreneurship. Petronas added that the partnerships build on the Commercial Collaboration Agreement it signed with the state government in 2021. This agreement is a blueprint for securing a reliable natural gas supply for the state’s domestic demand, Petronas said. Additionally, Petronas said that it has entered into a Technical Evaluation Agreement concerning the Layang-Layang Basin off the coast of Sabah, a frontier basin covering roughly 44,500 square kilometers (17,180 square miles). The TEA, which was established with ConocoPhillips Malaysia New Ventures Ltd. and Pertamina Hulu Energi (Pertamina), facilitates subsurface research, which includes regional geological evaluations and in-depth prospectivity assessments, according to Petronas. To further strengthen exploration and production cooperation, Petronas formalized a Memorandum of Understanding (MoU) with Dialog Resources Sdn. Bhd. (DIALOG) to advance the development of the Mutiara Cluster off Sabah’s East Coast. Under the Malaysia Bid Round 2025, the Mutiara Cluster Small Field Asset Production Sharing Contract was awarded to Dialog in June. This cluster provides an opportunity to monetize discovered resources through a cost-effective, optimized development plan that leverages existing materials and equipment, using past insights to accelerate start-up, Petronas said. As part of its initiatives to improve basin understanding and unlock future growth opportunities for Malaysia’s production, Petronas added

Cyprus Announces New Gas Discovery in Block 10

Exxon Mobil Corp. and QatarEnergy have made another natural gas discovery in Block 10 offshore Cyprus, the presidential office said Monday. “The ‘Pegasus-1’ well, located about 190 kilometers offshore southwest of Cyprus, indicated approximately 350 meters of gas-bearing reservoir”, the presidency said in an online statement after meeting with ExxonMobil officials. The well was drilled by Valaris DS-9 in 1,921 meters (6,302.49 feet) of water, according to the statement. “Further assessment will be required in the coming months to evaluate the results”, the statement added. This is the second discovery in the block, which, according to ExxonMobil, spans 635,554 acres (2,572 square kilometers) southwest of Cyprus in the Eastern Mediterranean. Seven years ago the partners found a gas-bearing reservoir about 436 feet in Glaucus-1. The discovery was drilled 13,780 feet deep in 6,769 feet of water, according to ExxonMobil. “Based on preliminary interpretation of the well data, the discovery could represent an in-place natural gas resource of approximately 5 trillion to 8 trillion cubic feet (142 billion to 227 billion cubic meters)”, the United States energy giant said in a press release February 28, 2019. Steve Greenlee, then-president of ExxonMobil Exploration Co., said, “These are encouraging results in a frontier exploration area”. An earlier well, Delphyne-1, did not yield commercial quantities of hydrocarbons. In March 2022 appraisal well Glaucus-2 confirmed a “high-quality gas-bearing reservoir”, according to the government. As of 2022, the best estimate of gas-in-place resources in the Glaucus discovery was 3.7 trillion cubic feet, according to online information from Cyprus’ Energy, Commerce and Industry Ministry. ExxonMobil operates Block 10 with a 60 percent stake. QatarEnergy holds 40 percent. The license was awarded April 2017. Texas-based ExxonMobil and state-owned QatarEnergy are also partners in Cyprus’ Block 5, for which they signed an exploration and production sharing contract December 2021

CoreWeave acquires Core Scientific for $9B to power AI infrastructure push

Such a shift, analysts say, could offer short-term benefits for enterprises, particularly in cost and access, but also introduces new operational risks. “This acquisition may potentially lower enterprise pricing through lease cost elimination and annual savings, while improving GPU access via expanded power capacity, enabling faster deployment of Nvidia chipsets and systems,” said Charlie Dai, VP and principal analyst at Forrester. “However, service reliability risks persist during this crypto-to-AI retrofitting.” This also indicates that struggling vendors such as Core Scientific and similar have a way to cash out, according to Yugal Joshi, partner at Everest Group. “However, it does not materially impact the availability of Nvidia GPUs and similar for enterprises,” Joshi added. “Consolidation does impact the pricing power of vendors.” Concerns for enterprises Rising demand for AI-ready infrastructure can raise concerns among enterprises, particularly over access to power-rich data centers and future capacity constraints. “The biggest concern that CIOs should have with this acquisition is that mature data center infrastructure with dedicated power is an acquisition target,” said Hyoun Park, CEO and chief analyst at Amalgam Insights. “This may turn out to create challenges for CIOs currently collocating data workloads or seeking to keep more of their data loads on private data centers rather than in the cloud.”

CoreWeave achieves a first with Nvidia GB300 NVL72 deployment

The deployment, Kimball said, “brings Dell quality to the commodity space. Wins like this really validate what Dell has been doing in reshaping its portfolio to accommodate the needs of the market — both in the cloud and the enterprise.” Although concerns were voiced last year that Nvidia’s next-generation Blackwell data center processors had significant overheating problems when they were installed in high-capacity server racks, he said that a repeat performance is unlikely. Nvidia, said Kimball “has been very disciplined in its approach with its GPUs and not shipping silicon until it is ready. And Dell almost doubles down on this maniacal quality focus. I don’t mean to sound like I have blind faith, but I’ve watched both companies over the last several years be intentional in delivering product in volume. Especially as the competitive market starts to shape up more strongly, I expect there is an extremely high degree of confidence in quality.” CoreWeave ‘has one purpose’ He said, “like Lambda Labs, Crusoe and others, [CoreWeave] seemingly has one purpose (for now): deliver GPU capacity to the market. While I expect these cloud providers will expand in services, I think for now the type of customer employing services is on the early adopter side of AI. From an enterprise perspective, I have to think that organizations well into their AI journey are the consumers of CoreWeave.” “CoreWeave is also being utilized by a lot of the model providers and tech vendors playing in the AI space,” Kimball pointed out. “For instance, it’s public knowledge that Microsoft, OpenAI, Meta, IBM and others use CoreWeave GPUs for model training and more. It makes sense. These are the customers that truly benefit from the performance lift that we see from generation to generation.”

Oracle to power OpenAI’s AGI ambitions with 4.5GW expansion

“For CIOs, this shift means more competition for AI infrastructure. Over the next 12–24 months, securing capacity for AI workloads will likely get harder, not easier. Though cost is coming down but demand is increasing as well, due to which CIOs must plan earlier and build stronger partnerships to ensure availability,” said Pareekh Jain, CEO at EIIRTrend & Pareekh Consulting. He added that CIOs should expect longer wait times for AI infrastructure. To mitigate this, they should lock in capacity through reserved instances, diversify across regions and cloud providers, and work with vendors to align on long-term demand forecasts. “Enterprises stand to benefit from more efficient and cost-effective AI infrastructure tailored to specialized AI workloads, significantly lower their overall future AI-related investments and expenses. Consequently, CIOs face a critical task: to analyze and predict the diverse AI workloads that will prevail across their organizations, business units, functions, and employee personas in the future. This foresight will be crucial in prioritizing and optimizing AI workloads for either in-house deployment or outsourced infrastructure, ensuring strategic and efficient resource allocation,” said Neil Shah, vice president at Counterpoint Research. Strategic pivot toward AI data centers The OpenAI-Oracle deal comes in stark contrast to developments earlier this year. In April, AWS was reported to be scaling back its plans for leasing new colocation capacity — a move that AWS Vice President for global data centers Kevin Miller described as routine capacity management, not a shift in long-term expansion plans. Still, these announcements raised questions around whether the hyperscale data center boom was beginning to plateau. “This isn’t a slowdown, it’s a strategic pivot. The era of building generic data center capacity is over. The new global imperative is a race for specialized, high-density, AI-ready compute. Hyperscalers are not slowing down; they are reallocating their capital to

Arista Buys VeloCloud to reboot SD-WANs amid AI infrastructure shift

What this doesn’t answer is how Arista Networks plans to add newer, security-oriented Secure Access Service Edge (SASE) capabilities to VeloCloud’s older SD-WAN technology. Post-acquisition, it still has only some of the building blocks necessary to achieve this. Mapping AI However, in 2025 there is always more going on with networking acquisitions than simply adding another brick to the wall, and in this case it’s the way AI is changing data flows across networks. “In the new AI era, the concepts of what comprises a user and a site in a WAN have changed fundamentally. The introduction of agentic AI even changes what might be considered a user,” wrote Arista Networks CEO, Jayshree Ullal, in a blog highlighting AI’s effect on WAN architectures. “In addition to people accessing data on demand, new AI agents will be deployed to access data independently, adapting over time to solve problems and enhance user productivity,” she said. Specifically, WANs needed modernization to cope with the effect AI traffic flows are having on data center traffic. Sanjay Uppal, now VP and general manager of the new VeloCloud Division at Arista Networks, elaborated. “The next step in SD-WAN is to identify, secure and optimize agentic AI traffic across that distributed enterprise, this time from all end points across to branches, campus sites, and the different data center locations, both public and private,” he wrote. “The best way to grab this opportunity was in partnership with a networking systems leader, as customers were increasingly looking for a comprehensive solution from LAN/Campus across the WAN to the data center.”

Data center capacity continues to shift to hyperscalers

However, even though colocation and on-premises data centers will continue to lose share, they will still continue to grow. They just won’t be growing as fast as hyperscalers. So, it creates the illusion of shrinkage when it’s actually just slower growth. In fact, after a sustained period of essentially no growth, on-premises data center capacity is receiving a boost thanks to genAI applications and GPU infrastructure. “While most enterprise workloads are gravitating towards cloud providers or to off-premise colo facilities, a substantial subset are staying on-premise, driving a substantial increase in enterprise GPU servers,” said John Dinsdale, a chief analyst at Synergy Research Group.

Oracle inks $30 billion cloud deal, continuing its strong push into AI infrastructure.

He pointed out that, in addition to its continued growth, OCI has a remaining performance obligation (RPO) — total future revenue expected from contracts not yet reported as revenue — of $138 billion, a 41% increase, year over year. The company is benefiting from the immense demand for cloud computing largely driven by AI models. While traditionally an enterprise resource planning (ERP) company, Oracle launched OCI in 2016 and has been strategically investing in AI and data center infrastructure that can support gigawatts of capacity. Notably, it is a partner in the $500 billion SoftBank-backed Stargate project, along with OpenAI, Arm, Microsoft, and Nvidia, that will build out data center infrastructure in the US. Along with that, the company is reportedly spending about $40 billion on Nvidia chips for a massive new data center in Abilene, Texas, that will serve as Stargate’s first location in the country. Further, the company has signaled its plans to significantly increase its investment in Abu Dhabi to grow out its cloud and AI offerings in the UAE; has partnered with IBM to advance agentic AI; has launched more than 50 genAI use cases with Cohere; and is a key provider for ByteDance, which has said it plans to invest $20 billion in global cloud infrastructure this year, notably in Johor, Malaysia. Ellison’s plan: dominate the cloud world CTO and co-founder Larry Ellison announced in a recent earnings call Oracle’s intent to become No. 1 in cloud databases, cloud applications, and the construction and operation of cloud data centers. He said Oracle is uniquely positioned because it has so much enterprise data stored in its databases. He also highlighted the company’s flexible multi-cloud strategy and said that the latest version of its database, Oracle 23ai, is specifically tailored to the needs of AI workloads. Oracle

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE