How Sakana AI’s new evolutionary algorithm builds powerful AI models without expensive retraining

Stay Ahead, Stay ONMINE

How Sakana AI’s new evolutionary algorithm builds powerful AI models without expensive retraining

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A new evolutionary technique from Japan-based AI lab Sakana AI enables developers to augment the capabilities of AI models without costly training and fine-tuning processes. The technique, called Model […]

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

A new evolutionary technique from Japan-based AI lab Sakana AI enables developers to augment the capabilities of AI models without costly training and fine-tuning processes. The technique, called Model Merging of Natural Niches (M2N2), overcomes the limitations of other model merging methods and can even evolve new models entirely from scratch.

M2N2 can be applied to different types of machine learning models, including large language models (LLMs) and text-to-image generators. For enterprises looking to build custom AI solutions, the approach offers a powerful and efficient way to create specialized models by combining the strengths of existing open-source variants.

What is model merging?

Model merging is a technique for integrating the knowledge of multiple specialized AI models into a single, more capable model. Instead of fine-tuning, which refines a single pre-trained model using new data, merging combines the parameters of several models simultaneously. This process can consolidate a wealth of knowledge into one asset without requiring expensive, gradient-based training or access to the original training data.

For enterprise teams, this offers several practical advantages over traditional fine-tuning. In comments to VentureBeat, the paper’s authors said model merging is a gradient-free process that only requires forward passes, making it computationally cheaper than fine-tuning, which involves costly gradient updates. Merging also sidesteps the need for carefully balanced training data and mitigates the risk of “catastrophic forgetting,” where a model loses its original capabilities after learning a new task. The technique is especially powerful when the training data for specialist models isn’t available, as merging only requires the model weights themselves.

AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

Turning energy into a strategic advantage
Architecting efficient inference for real throughput gains
Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

Early approaches to model merging required significant manual effort, as developers adjusted coefficients through trial and error to find the optimal blend. More recently, evolutionary algorithms have helped automate this process by searching for the optimal combination of parameters. However, a significant manual step remains: developers must set fixed sets for mergeable parameters, such as layers. This restriction limits the search space and can prevent the discovery of more powerful combinations.

How M2N2 works

M2N2 addresses these limitations by drawing inspiration from evolutionary principles in nature. The algorithm has three key features that allow it to explore a wider range of possibilities and discover more effective model combinations.

*Model Merging of Natural Niches Source: arXiv*

First, M2N2 eliminates fixed merging boundaries, such as blocks or layers. Instead of grouping parameters by pre-defined layers, it uses flexible “split points” and “mixing ration” to divide and combine models. This means that, for example, the algorithm might merge 30% of the parameters in one layer from Model A with 70% of the parameters from the same layer in Model B. The process starts with an “archive” of seed models. At each step, M2N2 selects two models from the archive, determines a mixing ratio and a split point, and merges them. If the resulting model performs well, it is added back to the archive, replacing a weaker one. This allows the algorithm to explore increasingly complex combinations over time. As the researchers note, “This gradual introduction of complexity ensures a wider range of possibilities while maintaining computational tractability.”

Second, M2N2 manages the diversity of its model population through competition. To understand why diversity is crucial, the researchers offer a simple analogy: “Imagine merging two answer sheets for an exam… If both sheets have exactly the same answers, combining them does not make any improvement. But if each sheet has correct answers for different questions, merging them gives a much stronger result.” Model merging works the same way. The challenge, however, is defining what kind of diversity is valuable. Instead of relying on hand-crafted metrics, M2N2 simulates competition for limited resources. This nature-inspired approach naturally rewards models with unique skills, as they can “tap into uncontested resources” and solve problems others can’t. These niche specialists, the authors note, are the most valuable for merging.

Third, M2N2 uses a heuristic called “attraction” to pair models for merging. Rather than simply combining the top-performing models as in other merging algorithms, it pairs them based on their complementary strengths. An “attraction score” identifies pairs where one model performs well on data points that the other finds challenging. This improves both the efficiency of the search and the quality of the final merged model.

M2N2 in action

The researchers tested M2N2 across three different domains, demonstrating its versatility and effectiveness.

The first was a small-scale experiment evolving neural network–based image classifiers from scratch on the MNIST dataset. M2N2 achieved the highest test accuracy by a substantial margin compared to other methods. The results showed that its diversity-preservation mechanism was key, allowing it to maintain an archive of models with complementary strengths that facilitated effective merging while systematically discarding weaker solutions.

Next, they applied M2N2 to LLMs, combining a math specialist model (WizardMath-7B) with an agentic specialist (AgentEvol-7B), both of which are based on the Llama 2 architecture. The goal was to create a single agent that excelled at both math problems (GSM8K dataset) and web-based tasks (WebShop dataset). The resulting model achieved strong performance on both benchmarks, showcasing M2N2’s ability to create powerful, multi-skilled models.

*A model merge with M2N2 combines the best of both seed models Source: arXiv*

Finally, the team merged diffusion-based image generation models. They combined a model trained on Japanese prompts (JSDXL) with three Stable Diffusion models primarily trained on English prompts. The objective was to create a model that combined the best image generation capabilities of each seed model while retaining the ability to understand Japanese. The merged model not only produced more photorealistic images with better semantic understanding but also developed an emergent bilingual ability. It could generate high-quality images from both English and Japanese prompts, even though it was optimized exclusively using Japanese captions.

For enterprises that have already developed specialist models, the business case for merging is compelling. The authors point to new, hybrid capabilities that would be difficult to achieve otherwise. For example, merging an LLM fine-tuned for persuasive sales pitches with a vision model trained to interpret customer reactions could create a single agent that adapts its pitch in real-time based on live video feedback. This unlocks the combined intelligence of multiple models with the cost and latency of running just one.

Looking ahead, the researchers see techniques like M2N2 as part of a broader trend toward “model fusion.” They envision a future where organizations maintain entire ecosystems of AI models that are continuously evolving and merging to adapt to new challenges.

“Think of it like an evolving ecosystem where capabilities are combined as needed, rather than building one giant monolith from scratch,” the authors suggest.

The researchers have released the code of M2N2 on GitHub.

The biggest hurdle to this dynamic, self-improving AI ecosystem, the authors believe, is not technical but organizational. “In a world with a large ‘merged model’ made up of open-source, commercial, and custom components, ensuring privacy, security, and compliance will be a critical problem.” For businesses, the challenge will be figuring out which models can be safely and effectively absorbed into their evolving AI stack.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

How AWS is reinventing the telco revenue model

Consider what that means for the mobile operator and its relationship with its customers. Instead of selling a generic 5G pipe with a static SLA, a telco can now sell a dynamic, guaranteed slice for a specific use case—say, a remote robotic surgery setup or a high-density, low-latency industrial IoT

What’s the biggest barrier to AI success?

AI’s challenge starts with definition. We hear all the time about how AI raises productivity, and many have experienced that themselves. But what, exactly, does “productivity” mean? To the average person, it means they can do things with less effort, which they like, so it generates a lot of favorable

IBM proposes unified architecture for hybrid quantum-classical computing

Quantum computers and classical HPC are traditionally “disparate systems [that] operate in isolation,” IBM researchers explain in a new paper. This can be “cumbersome,” because users have to manually orchestrate workflows, coordinate scheduling, and transfer data between systems, thus hindering productivity and “severely” limiting algorithmic exploration. But a hybrid approach

FluidCloud’s Large Infrastructure Model targets the multicloud networking gap

“It’s a mixture of multiple models,” Omar told Network World. “The conversion and the core capability are not an LLM; it’s our own conditional model.” A standard LLM sits at the front end to parse user intent. The Terraform generation and cloud-to-cloud conversion work runs on custom foundation models trained

Brent retreats from highs after Trump signals Iran war nearing end

@import url(‘https://fonts.googleapis.com/css2?family=Inter:[email protected]&display=swap’); a { color: var(–color-primary-main); } .ebm-page__main h1, .ebm-page__main h2, .ebm-page__main h3, .ebm-page__main h4, .ebm-page__main h5, .ebm-page__main h6 { font-family: Inter; } body { line-height: 150%; letter-spacing: 0.025em; font-family: Inter; } button, .ebm-button-wrapper { font-family: Inter; } .label-style { text-transform: uppercase; color: var(–color-grey); font-weight: 600; font-size: 0.75rem; } .caption-style { font-size: 0.75rem; opacity: .6; } #onetrust-pc-sdk [id*=btn-handler], #onetrust-pc-sdk [class*=btn-handler] { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-policy a, #onetrust-pc-sdk a, #ot-pc-content a { color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-pc-sdk .ot-active-menu { border-color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-accept-btn-handler, #onetrust-banner-sdk #onetrust-reject-all-handler, #onetrust-consent-sdk #onetrust-pc-btn-handler.cookie-setting-link { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-consent-sdk .onetrust-pc-btn-handler { color: #c19a06 !important; border-color: #c19a06 !important; } Oil futures eased from recent highs Tuesday as markets reacted to comments from US President Donald Trump suggesting the war with Iran may be nearing its conclusion, easing concerns about prolonged disruptions to Middle East crude supplies. Brent crude had climbed above $100/bbl amid escalating tensions in the region and fears that the war could prolong disruptions to shipments through the Strait of Hormuz—one of the world’s most critical energy chokepoints and a transit route for roughly one-fifth of global oil supply. Prices pulled back after Pres. Trump said the war was “almost done,” prompting traders to reassess the risk premium that had built into crude markets during the latest escalation. The earlier gains were driven by the fact that the war had disrupted tanker traffic in the Strait of Hormuz, raising concerns about wider supply disruptions from major Gulf oil producers. While the latest remarks helped calm markets, analysts note that geopolitical risks remain elevated and price volatility is likely to persist as traders monitor developments in the region. Any renewed escalation could quickly send crude prices higher again.

Southwest Arkansas lithium project moves toward FID with 10-year offtake deal

Smackover Lithium, a joint venture between Standard Lithium Ltd. and Equinor, through subsidiaries of Equinor ASA, signed the first commercial offtake agreement for the South West Arkansas Project (SWA Project) with commodities group Trafigura Trading LLC. Under the terms of a binding take-or-pay offtake agreement, the JV will supply Trafigura with 8,000 metric tonnes/year (tpy) of battery-quality lithium carbonate (Li2CO3) over a 10-year period, beginning at the start of commercial production. Smackover Lithium is expected to achieve final investment decision (FID) for the project, which aims to use direct lithium extraction technology to produce lithium from brine resources in the Smackover formation in southern Arkansas, in 2026, with first production anticipated in 2028. The project encompasses about 30,000 acres of brine leases in the region, with the initial phase of project development focused on production from the 20,854-acre Reynolds Brine Unit. Front-end engineering design was completed in support of a definitive feasibility study with a principal recommendation that the project is ready to progress to FID. While pricing terms of the Trafigura deal were kept confidential, Standard Lithium said they are “structured to support the anticipated financing for the project.” The JV is seeking to finalize customer offtake agreements for roughly 80% of the 22,500 tonnes of annual nameplate lithium carbonate capacity for the initial phase of the project. This agreement represents over 40% of the targeted offtake commitments. Formed in 2024, Smackover Lithium is developing multiple DLE projects in Southwest Arkansas and East Texas. Standard Lithium is operator of the projecs with 55% interest. Equinor holds the remaining 45% interest.

Equinor makes oil and gas discoveries in the North Sea

Equinor Energy AS discovered oil in the Troll area and gas and condensate in the Sleipner area of the North Sea. Byrding C discovery well 35/11-32 S in production license (PL) 090 HS was made 5 km northwest of Fram field in Troll. The well was drilled by the COSL Innovator rig in 373 m of water to 3,517 m TVD subsea. It was terminated in the Heather formation from the Middle Jurassic. The primary exploration target was to prove petroleum in reservoir rocks from the Late Jurassic deep marine equivalent to the Sognefjord formation. The secondary target was to prove petroleum and investigate the presence of potential reservoir rocks in two prospective intervals from the Middle Jurassic in deep marine equivalents to the Fensfjord formation. The well encountered a 22-m oil column in sandstone layers in the Sognefjord formation with a total thickness of 82 m, of which 70 m was sandstone with moderate to good reservoir properties. The oil-water contact was encountered. The secondary exploration target in the Fensfjord formation did not prove reservoir rocks or hydrocarbons. The well was not formation-tested, but data and samples were collected. The well has been permanently plugged. Preliminary estimates indicate the size of the discovery is 4.4–8.2 MMboe. Oil discovered in Byrding C will be produced using existing or future infrastructure in the area. The Frida Kahlo discovery was drilled from the Sleipner B platform in production license PL 046 northwest of Sleipner Vest and is estimated to contain 5–9 MMboe of gas and condensate. The well will be brought on stream as early as April. The four most recent exploration wells in the Sleipner area, drilled over a 3-month period, include Lofn, Langemann, Sissel, and Frida Kahlo. All have all proven gas and condensate in the Hugin formation, with combined estimated

IEA launches record strategic oil release as Middle East war disrupts supply

The International Energy Agency (IEA) on Mar. 11 approved the largest emergency oil stock release in its history, making 400 million bbl available from member-country reserves in response to market disruptions tied to the war in the Middle East. The coordinated action, agreed unanimously by the IEA’s 32 member countries, is intended to ease supply pressure and temper price volatility as crude markets react to disrupted flows through the Strait of Hormuz. “The conflict in the Middle East is having significant impacts on global oil and gas markets, with major implications for energy security, energy affordability and the global economy for oil,” IEA executive director Fatih Birol said. The release more than doubles the previous IEA record set in 2022, when member countries collectively made 182.7 million bbl available following Russia’s invasion of Ukraine. Under the IEA system, member countries are required to maintain emergency oil stocks equal to at least 90 days of net imports, giving the agency a mechanism to respond when severe disruptions threaten global supply. The move comes after crude prices surged amid concerns that the US-Iran war could lead to prolonged disruption of exports from the Gulf. Despite the planned stock release, traders remain uncertain about whether reserve barrels alone will be enough to offset losses if the disruption persists. IEA said the emergency barrels will be supplied to the market from government-controlled and obligated industry stocks held across member countries. The action marks the sixth coordinated stock release in the agency’s history and underscores the seriousness of the current supply shock. Earlier the day, Japanese Prime Minister Sanae Takaichi said that Japan might start using its strategic oil reserves as early as next week, citing Japan’s unusually high dependence on Middle Eastern crude oil.

Infographic: Strait of Hormuz energy trade 2025

BOEM: US OCS holds 65.8 billion bbl of technically recoverable reserves

The US Outer Continental Shelf (OCS) holds mean undiscovered technically recoverable resources (UTRR) of 65.8 billion bbl of oil and 218.43 tcf of natural gas, the US Bureau of Ocean Energy Management (BOEM) said Mar. 9. Based on current production trends, these undiscovered resources represent the potential for 100 or more years of energy production from the US Outer Continental Shelf (OCS), BOEM said. A large portion of undiscovered OSC resources is located offshore the Gulf of Mexico and Alaska, according to the report. The offshore Gulf holds 26.9 million bbl of oil and 45.59 tcf of gas, while offshore Alaska holds an estimated mean 24.1 million bbl of oil and 122.29 tcf of gas. Offshore Pacific holds a mean UTRR of 10.3 million barrels of oil and 16.2 trillion cubic feet of gas, the report said. Offshore Atlantic holds a mean UTRR of 10.3 billion barrels of oil and 16.2 trillion cubic feet of gas. The assessment also evaluates the impact of prices on hydrocarbon recovery. Alaska is particularly price-sensitive, with mean undiscovered economically recoverable resources (UERR) negligible until prices average $100/bbl and $17.79/Mcf. At those levels, the mean UERR stands at 6.25 billion bbl and 13.25 tcf. At $160/bbl and $28.47/Mcf, recoverable resources jump to 14.67 billion bbl and 58.78 tcf. In the Gulf of Mexico, the mean UERR is 17.51 billion bbl of oil and 13.71 tcf at average prices of $60/bbl and $3.20/Mcf, increasing to 20.51 billion bbl and 17.49 tcf at average prices of $100/bbl and $5.34/Mcf, respectively. BOEM conducts a national resource assessment every 4 years to understand the “distribution of undiscovered oil and gas resources on the OCS” and identify opportunities for additional oil and gas exploration and development. “The Outer Continental Shelf holds tremendous resource potential,” said BOEM Acting Director Matt Giacona. “This

Data mining? Old servers could become new source of rare earths

For decades, he said, “the retirement of data center equipment was treated almost entirely as a compliance and disposal issue. Enterprises focused on secure decommissioning, certified recycling, and documented destruction of sensitive hardware. Once equipment left production environments, its economic life was assumed to be largely finished.” That assumption, he pointed out, “is beginning to change, because the hardware inside modern data centres contains a wide range of strategically important materials. Servers, storage systems, networking equipment, and power components contain copper, aluminum, silver, gold, and increasingly small but significant quantities of rare earth elements and other critical minerals.” These materials play a vital role in the manufacturing of semiconductors, energy systems, defense electronics, and advanced computing infrastructure, he explained, noting, “as global demand for digital infrastructure continues to expand, the volume of retired hardware entering disposal channels is rising quickly.” Electronic waste has already become one of the fastest growing waste streams in the world. “Global volumes now exceed 60 million tonnes annually and are projected to move toward eighty million tonnes by the end of the decade if current trends continue,” he said. “Data center infrastructure represents only a portion of that total, but it is a particularly important portion because it is concentrated, professionally managed, and replaced in structured cycles.” For a metals producer, he said, data center infrastructure represents a highly attractive feedstock, because unlike consumer electronics, enterprise hardware is replaced in large batches and flows through professional asset management channels. That predictability, said Gogia, “allows recyclers to design specialized processes that target specific components and materials. Over time, this creates the foundation for an industrial scale circular supply chain in which retired electronics feed back into the production of new materials.”

Meta is developing more AI chips for itself

With demand for AI chips rising and supplies tightening, Meta is taking its AI computing needs into its own hands and developing more of its own chips: It will produce four new generations of chips over the next two years. Cloud computing giants including Meta, AWS, and Google have been keen to develop their own chips to improve the performance of their own data centers. Meta started its own chip program in 2023, when it implemented the Meta Training and Inference Accelerator (MTIA), a family of custom-built silicon chips to power its AI workloads efficiently. The MTIA 300, which Meta will use for ranking and recommendations training, is already in production, Meta said. It will use the other planned chips, the MTIA 400, 450, and 500, mainly for generative AI inference production, it said.

Arista targets AI data centers with new liquid cooled pluggable optic module

To prove their point, the authors imagined a 400 MW AI datacenter with 1024 GPU racks of 128 GPUs each for a total of 128,000 GPUs. “Assume 12.8T scale-up and 1.6T scale-out bandwidth per GPU. With OSFP switch racks that have a density of 1.6 Pbps per rack, this would require more than 1,400 switch racks for scale-up and scale-out fabrics. With XPO, this would require 75% fewer racks, saving over 1,050 racks or 44 % of the floor space,” Bechtolsheim and Vusirikala stated in the blog. “Eliminating 75% of switch racks translates to massive reductions in construction and infrastructure costs, including power distribution, plumbing and installation costs, while accelerating deployment timelines,” Bechtolsheim and Vusirikala stated. Arista said the water-cooling capability of XPO is also an important feature. “All large AI data centers will be liquid cooled and the switches that go into these data centers also need to be liquid cooled,” Bechtolsheim and Vusirikala stated. “While one can add liquid cooled cold plates on flat-top OSFP modules, this does not substantially improve thermal performance.” XPO solves this problem by integrating a liquid cold plate inside the module, with two 32-channel paddle cards sharing the common cold plate which can cool both low power as well as high-power optics such as 8x1600G-ZR/ZR+ with up to 400W of power, Bechtolsheim and Vusirikala stated. XPO modules are much simpler than OSPF modules which improves reliability as well. “Each 32-channel paddle card has only one microcontroller and one set of voltage converters, a 75% reduction in common components versus 4 OSFPs,” Bechtolsheim and Vusirikala wrote.

Cisco grows high-end optical support for AI clusters

Cisco has also upgraded its Network Conversion System (NCS) with a 1RU, 800GE line card offering 12.8T capacity, with 32 OSFP-based ports for 100GE, 400GE, and 800GE clients and 800ZR/ZR+ WDM trunks. The NCS 1014 doubles the density of previous-generation NCS versions and now includes MACsec encryption (IEEE 802.1AE) to secure point-to-point links with hardware-based encryption, data integrity, and authentication for Ethernet traffic, Ghioni stated. It supports enhanced capacity and performance with C&L-band support and NCS 1014 systems with the 2.4T WDM line card based on the Coherent Interconnect Module 8 and now supports 800 GE clients, which can be mapped directly to a wavelength or inverse multiplexed across two wavelengths to maximize reach, Ghioni wrote. In the pluggable optic arena, Cisco is now offering a Quad Small Form Factor Pluggable Double Density (QSFP-DD) Pluggable Protection Switch Module that can monitor the optical link and switch traffic if it detects a fault in less than 50 milliseconds. The module occupies a quarter of the rack space compared to traditional protection devices—offering 90% rack space saving over available options, Ghioni wrote. It is aimed at Metro and DCI network customers where sub-50 ms failure recovery is essential and data centers needing fiber protection without bulky hardware, Ghioni stated. Cisco also added its Acacia developed Bright QSFP28 100ZR 0 dBm coherent optical pluggable in a standard QSFP28 form factor. It is aimed at edge, access, enterprise, and campus network deployment. Cisco has been actively growing its optical portfolio recently adding the Cisco Silicon One G300, which powers 102.4T N9000 and Cisco 8000 systems, as well as advanced 1.6T OSFP optics and 800G Linear Pluggable Optics.

Datalec targets rapid infrastructure deployment with new modular data centers

“We are engineering the data center with a new lens bringing pre-engineered system designs that are flexible and adaptable that enables a tailored solution for clients,” said John Lever, director of modular solutions at Datalec. The systems are flexible enough that these solutions cater for all types of data center, from standard server technology to AI and high-density compute. Datalec also provides “bolt-on” solutions, including a ‘digital wrapper’ including digital twinning and lifecycle and global support, Lever says. Another way Datalec says it differentiates from competing modular designs is a larger share of work is done offsite in a controlled manufacturing environment, which cuts onsite construction time, improves safety and limits disruption to live facilities, Lever says. The company competes with other modular data center vendors including Schneider Electric, Vertiv, Flex many others. DPI’s says its services are aimed at colocation providers, hyperscale and AI infrastructure teams, and large enterprises that need to add capacity quickly, safely and cost effectively across multiple regions.

Study finds significant savings from direct current power for AI workloads

The result is a 50% to 80% reduction in copper usage, due to fewer conductors and less parallel cabling, and an 8% to 12% reduction in annual energy-related OpEx through lower conversion and distribution losses. By reducing conductor count, cabling, and redundant power components, 800VDC enables meaningful savings at both build-out and operational stages. AI-first facilities can see a $4 million to $8 million in CapEx savings per 10 MW build by reducing upstream AC. For a one-gigawatt data center, you’re saving a couple million pounds of copper wire, he said. Burke says an all-DC data center is best done with a whole new facility rather than retrofitting old facilities. “[DC] is going to be in a lot of greenfield data centers that are going to be built, and data centers that are going to go to higher compute power are also going to DC,” he said. He did recommend all-DC retrofits for existing data centers that are going to employ high power computing with GPUs. Enteligent’s unnamed and as yet unreleased product is a converter that takes 800 volts and partitions it to 50 volts for the computing servers. The company will provide a new power supply, power shelf that converts 800 volts DC to 50 volts DC much more efficiently than any current power supplies. Burke said the company is doing NDA level testing and pilot programs now with its product, but it will be making a formal announcement within the next few weeks. There are a number of players in the DC arena focusing on different parts of the power supply market including Vertiv, Rutherford, Siemens, Eaton and many more.

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE