Stay Ahead, Stay ONMINE

Training Large Language Models: From TRPO to GRPO

Deepseek has recently made quite a buzz in the AI community, thanks to its impressive performance at relatively low costs. I think this is a perfect opportunity to dive deeper into how Large Language Models (LLMs) are trained. In this article, we will focus on the Reinforcement Learning (RL) side of things: we will cover […]

Deepseek has recently made quite a buzz in the AI community, thanks to its impressive performance at relatively low costs. I think this is a perfect opportunity to dive deeper into how Large Language Models (LLMs) are trained. In this article, we will focus on the Reinforcement Learning (RL) side of things: we will cover TRPO, PPO, and, more recently, GRPO (don’t worry, I will explain all these terms soon!) 

I have aimed to keep this article relatively easy to read and accessible, by minimizing the math, so you won’t need a deep Reinforcement Learning background to follow along. However, I will assume that you have some familiarity with Machine Learning, Deep Learning, and a basic understanding of how LLMs work.

I hope you enjoy the article!

The 3 steps of LLM training

The 3 steps of LLM training [1]

Before diving into RL specifics, let’s briefly recap the three main stages of training a Large Language Model:

  • Pre-training: the model is trained on a massive dataset to predict the next token in a sequence based on preceding tokens.
  • Supervised Fine-Tuning (SFT): the model is then fine-tuned on more targeted data and aligned with specific instructions.
  • Reinforcement Learning (often called RLHF for Reinforcement Learning with Human Feedback): this is the focus of this article. The main goal is to further refine responses’ alignments with human preferences, by allowing the model to learn directly from feedback.

Reinforcement Learning Basics

A robot trying to exit a maze! [2]

Before diving deeper, let’s briefly revisit the core ideas behind Reinforcement Learning.

RL is quite straightforward to understand at a high level: an agent interacts with an environment. The agent resides in a specific state within the environment and can take actions to transition to other states. Each action yields a reward from the environment: this is how the environment provides feedback that guides the agent’s future actions. 

Consider the following example: a robot (the agent) navigates (and tries to exit) a maze (the environment).

  • The state is the current situation of the environment (the robot’s position in the maze).
  • The robot can take different actions: for example, it can move forward, turn left, or turn right.
  • Successfully navigating towards the exit yields a positive reward, while hitting a wall or getting stuck in the maze results in negative rewards.

Easy! Now, let’s now make an analogy to how RL is used in the context of LLMs.

RL in the context of LLMs

Simplified RLHF Process [3]

When used during LLM training, RL is defined by the following components:

  • The LLM itself is the agent
  • Environment: everything external to the LLM, including user prompts, feedback systems, and other contextual information. This is basically the framework the LLM is interacting with during training.
  • Actions: these are responses to a query from the model. More specifically: these are the tokens that the LLM decides to generate in response to a query.
  • State: the current query being answered along with tokens the LLM has generated so far (i.e., the partial responses).
  • Rewards: this is a bit more tricky here: unlike the maze example above, there is usually no binary reward. In the context of LLMs, rewards usually come from a separate reward model, which outputs a score for each (query, response) pair. This model is trained from human-annotated data (hence “RLHF”) where annotators rank different responses. The goal is for higher-quality responses to receive higher rewards.

Note: in some cases, rewards can actually get simpler. For example, in DeepSeekMath, rule-based approaches can be used because math responses tend to be more deterministic (correct or wrong answer)

Policy is the final concept we need for now. In RL terms, a policy is simply the strategy for deciding which action to take. In the case of an LLM, the policy outputs a probability distribution over possible tokens at each step: in short, this is what the model uses to sample the next token to generate. Concretely, the policy is determined by the model’s parameters (weights). During RL training, we adjust these parameters so the LLM becomes more likely to produce “better” tokens— that is, tokens that produce higher reward scores.

We often write the policy as:

where a is the action (a token to generate), s the state (the query and tokens generated so far), and θ (model’s parameters).

This idea of finding the best policy is the whole point of RL! Since we don’t have labeled data (like we do in supervised learning) we use rewards to adjust our policy to take better actions. (In LLM terms: we adjust the parameters of our LLM to generate better tokens.)

TRPO (Trust Region Policy Optimization)

An analogy with supervised learning

Let’s take a quick step back to how supervised learning typically works. you have labeled data and use a loss function (like cross-entropy) to measure how close your model’s predictions are to the true labels.

We can then use algorithms like backpropagation and gradient descent to minimize our loss function and update the weights θ of our model.

Recall that our policy also outputs probabilities! In that sense, it is analogous to the model’s predictions in supervised learning… We are tempted to write something like:

where s is the current state and a is a possible action.

A(s, a) is called the advantage function and measures how good is the chosen action in the current state, compared to a baseline. This is very much like the notion of labels in supervised learning but derived from rewards instead of explicit labeling. To simplify, we can write the advantage as:

In practice, the baseline is calculated using a value function. This is a common term in RL that I will explain later. What you need to know for now is that it measures the expected reward we would receive if we continue following the current policy from the state s.

What is TRPO?

TRPO (Trust Region Policy Optimization) builds on this idea of using the advantage function but adds a critical ingredient for stability: it constrains how far the new policy can deviate from the old policy at each update step (similar to what we do with batch gradient descent for example).

  • It introduces a KL divergence term (see it as a measure of similarity) between the current and the old policy:
  • It also divides the policy by the old policy. This ratio, multiplied by the advantage function, gives us a sense of how beneficial each update is relative to the old policy.

Putting it all together, TRPO tries to maximize a surrogate objective (which involves the advantage and the policy ratio) subject to a KL divergence constraint.

PPO (Proximal Policy Optimization)

While TRPO was a significant advancement, it’s no longer used widely in practice, especially for training LLMs, due to its computationally intensive gradient calculations.

Instead, PPO is now the preferred approach in most LLMs architecture, including ChatGPT, Gemini, and more.

It is actually quite similar to TRPO, but instead of enforcing a hard constraint on the KL divergence, PPO introduces a “clipped surrogate objective” that implicitly restricts policy updates, and greatly simplifies the optimization process.

Here is a breakdown of the PPO objective function we maximize to tweak our model’s parameters.

Image by the Author

GRPO (Group Relative Policy Optimization)

How is the value function usually obtained?

Let’s first talk more about the advantage and the value functions I introduced earlier.

In typical setups (like PPO), a value model is trained alongside the policy. Its goal is to predict the value of each action we take (each token generated by the model), using the rewards we obtain (remember that the value should represent the expected cumulative reward).

Here is how it works in practice. Take the query “What is 2+2?” as an example. Our model outputs “2+2 is 4” and receives a reward of 0.8 for that response. We then go backward and attribute discounted rewards to each prefix:

  • “2+2 is 4” gets a value of 0.8
  • “2+2 is” (1 token backward) gets a value of 0.8γ
  • “2+2” (2 tokens backward) gets a value of 0.8γ²
  • etc.

where γ is the discount factor (0.9 for example). We then use these prefixes and associated values to train the value model.

Important note: the value model and the reward model are two different things. The reward model is trained before the RL process and uses pairs of (query, response) and human ranking. The value model is trained concurrently to the policy, and aims at predicting the future expected reward at each step of the generation process.

What’s new in GRPO

Even if in practice, the reward model is often derived from the policy (training only the “head”), we still end up maintaining many models and handling multiple training procedures (policy, reward, value model). GRPO streamlines this by introducing a more efficient method.

Remember what I said earlier?

In PPO, we decided to use our value function as the baseline. GRPO chooses something else: Here is what GRPO does: concretely, for each query, GRPO generates a group of responses (group of size G) and uses their rewards to calculate each response’s advantage as a z-score:

where rᵢ is the reward of the i-th response and μ and σ are the mean and standard deviation of rewards in that group.

This naturally eliminates the need for a separate value model. This idea makes a lot of sense when you think about it! It aligns with the value function we introduced before and also measures, in a sense, an “expected” reward we can obtain. Also, this new method is well adapted to our problem because LLMs can easily generate multiple non-deterministic outputs by using a low temperature (controls the randomness of tokens generation).

This is the main idea behind GRPO: getting rid of the value model.

Finally, GRPO adds a KL divergence term (to be exact, GRPO uses a simple approximation of the KL divergence to improve the algorithm further) directly into its objective, comparing the current policy to a reference policy (often the post-SFT model).

See the final formulation below:

Image by the Author

And… that’s mostly it for GRPO! I hope this gives you a clear overview of the process: it still relies on the same foundational ideas as TRPO and PPO but introduces additional improvements to make training more efficient, faster, and cheaper — key factors behind DeepSeek’s success.

Conclusion

Reinforcement Learning has become a cornerstone for training today’s Large Language Models, particularly through PPO, and more recently GRPO. Each method rests on the same RL fundamentals — states, actions, rewards, and policies — but adds its own twist to balance stability, efficiency, and human alignment:

TRPO introduced strict policy constraints via KL divergence

PPO eased those constraints with a clipped objective

GRPO took an extra step by removing the value model requirement and using group-based reward normalization. Of course, DeepSeek also benefits from other innovations, like high-quality data and other training strategies, but that is for another time!

I hope this article gave you a clearer picture of how these methods connect and evolve. I believe that Reinforcement Learning will become the main focus in training LLMs to improve their performance, surpassing pre-training and SFT in driving future innovations. 

If you’re interested in diving deeper, feel free to check out the references below or explore my previous posts.

Thanks for reading, and feel free to leave a clap and a comment!


Want to learn more about Transformers or dive into the math behind the Curse of Dimensionality? Check out my previous articles:

Transformers: How Do They Transform Your Data?
Diving into the Transformers architecture and what makes them unbeatable at language taskstowardsdatascience.com

The Math Behind “The Curse of Dimensionality”
Dive into the “Curse of Dimensionality” concept and understand the math behind all the surprising phenomena that arise…towardsdatascience.com



References:

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

AI-driven network management gains enterprise trust

The way the full process works is that the raw data feed comes in, and machine learning is used to identify an anomaly that could be a possible incident. That’s where the generative AI agents step up. In addition to the history of similar issues, the agents also look for

Read More »

Chinese cyberspies target VMware vSphere for long-term persistence

Designed to work in virtualized environments The CISA, NSA, and Canadian Cyber Center analysts note that some of the BRICKSTORM samples are virtualization-aware and they create a virtual socket (VSOCK) interface that enables inter-VM communication and data exfiltration. The malware also checks the environment upon execution to ensure it’s running

Read More »

IBM boosts DNS protection for multicloud operations

“In addition to this DNS synchronization, you can publish DNS configurations to your Amazon Simple Storage Service (S3) bucket. As you implement DNS changes, the S3 bucket will automatically update. The ability to store multiple configurations in your S3 bucket allows you to choose the most appropriate restore point if

Read More »

Crude Settles Lower

Oil eased by the most in almost three weeks as traders monitored India’s buying of Russian crude and refined products markets slumped, leading the energy complex lower. West Texas Intermediate futures fell 2% to settle near $59 a barrel, weighed down by losses in US equities, and have now been trading in a range of less than $4 since the start of November. Russian President Vladimir Putin last week promised “uninterrupted shipments” of fuel to India even as Moscow faces steeper sanctions over its war in Ukraine. The shipments will likely be a key point for discussions as US negotiators arrive in the South Asian nation for trade talks. “Oversupply concerns will eventually be realized, especially as Russian oil and refined product flows eventually circumvent existing sanctions,” said Vivek Dhar, an analyst with Commonwealth Bank of Australia. That will see Brent futures fall toward $60 a barrel through 2026, he said. Among products, gasoline futures dropped 2% in New York, after hitting the lowest level since May 2021 last week. Diesel prices also weakened in a drag on energy commodities across the board. The focus on Moscow’s flows comes as a potential peace deal between Ukraine and Russia also remained in focus. US President Donald Trump said he was disappointed in Ukrainian President Volodymyr Zelenskiy’s handling of a US proposal to end the nearly four-year-old war. Those tensions will be weighed against glut concerns, with higher supply from OPEC+ and producers outside the group — including the US, Brazil and Guyana — set to overwhelm tepid demand growth. The US’s Energy Information Administration, the International Energy Agency and the Organization of the Petroleum Exporting Countries will publish monthly market outlooks this week that may provide further insights. Both WTI and Brent remain on their longest runs below their 100-day moving

Read More »

Energy Department Announces $11 Million in Awards to Develop HALEU Transportation Packages

IDAHO FALLS, ID. —The U.S. Department of Energy (DOE) today announced $11 million in awards to five U.S. companies to develop and license new or modified transportation packages for high-assay low-enriched uranium (HALEU). The announcement was made during U.S. Secretary of Energy Chris Wright’s visit to Idaho National Laboratory (INL), marking the final stop in his ongoing tour of all 17 DOE National Laboratories. These selections advance President Trump’s recent executive orders and commitment to rebuild the Nation’s nuclear fuel cycle, strengthen domestic enrichment and fabrication capabilities, and accelerate the deployment of advanced reactors to usher in a new American nuclear renaissance. “From critical minerals to nuclear fuel, the Trump administration is fully committed to restoring the supply chains needed to secure America’s future,” said Secretary Wright. “Thanks to President Trump, the Energy Department is operating at record speeds to unleash the next American Nuclear Renaissance and to deliver more affordable, reliable, and secure energy for American families and businesses.” DOE’s $11 million in awards will support industry-led efforts to design, modify, and license transportation packages through the U.S. Nuclear Regulatory Commission (NRC). These investments will help establish long-term, economical HALEU transport capabilities that better serve domestic reactor developers and strengthen the U.S. nuclear supply chain. The following companies were selected to develop long-term economic solutions for the safe transport of HALEU through two topic areas: Topic Area 1: Develop new package designs that can be licensed by the NRC NAC International Westinghouse Electric Company Container Technologies Industries, LLC American Centrifuge Operating Paragon D&E Topic Area 2: Modify existing design packages for NRC certification NAC International Projects under Topic Area 1 will have performance periods of up to three years; the Topic Area 2 project will have a performance period of up to two years. Funding is provided through DOE’s

Read More »

Newsom Sparks Rebellion in Bay Area Town

A small city perched on San Francisco Bay poses a big obstacle to California Governor Gavin Newsom’s plans to prevent gasoline price spikes in a state that already pays more at the pump than any other.  Valero Energy Corp. plans to shut its refinery in Benicia in April, part of a wave of refinery closures across California as the state shifts away from fossil fuels. Newsom is counting on increased imports to ensure gas prices don’t soar, and his administration is exploring the Valero site — which is connected to a marine port — as a potential storage hub, said Benicia Mayor Steve Young.  The idea, however, doesn’t sit well with Young or other leaders in this community of 27,000, which relies on the refinery for jobs and taxes. If Valero can’t be persuaded to keep the refinery open, he would rather redevelop the site to attract a new industry, or fill it with retail and housing.   “We’re going to put up whatever resistance we can,” Young said in an interview. Making the site a fuel storage hub “is a terrible situation, because there are no jobs, there are no taxes and you have continuous emissions from tankers.”  Young and the governor’s staff discussed the idea in meetings last month, he said, with state officials asking if the city would accept a storage facility for up to 20 years. No formal proposal has been submitted to the city, he said. Young also warned that Benicia could push forward a ballot measure to tax gasoline imports, if necessary. The governor’s office said they “remain engaged with all interested and impacted stakeholders,” declining to comment further. Valero, based in San Antonio, Texas, didn’t respond to requests for comment. California has seen its fleet of refineries shrink as the state moves to renewable

Read More »

Sanctioned Russian LNG Plant Ships to China

A Russian liquefied natural gas export facility delivered its first shipment to China since being sanctioned by the US in January, the latest sign of increased energy cooperation between Beijing and Moscow. The Valera vessel, which loaded a shipment from Gazprom PJSC’s Portovaya facility on the Baltic Sea in October, arrived at the Beihai import terminal in southern China on Monday, ship data compiled by Bloomberg shows. Both Valera and Portovaya were sanctioned by Joe Biden’s administration to thwart Russia’s plans to boost LNG exports. China, which doesn’t recognize the unilateral sanctions, has increasingly bought blacklisted Russian gas over the last few months, ratcheting up energy ties between the two countries. Beijing has also ignored a broader push by US President Donald Trump to halt sales of Russian oil, which will likely be a key part of trade negotiations between Washington and New Delhi this week. Russia has two relatively small LNG export facilities on the Baltic Sea, with the Novatek PJSC-led Vysotsk plant also blacklisted by the US. Another sanctioned Russian plant, the Arctic LNG 2 site in Siberia, started delivering fuel to Beihai in late August. Total Russian LNG shipments to China, including from unsanctioned plants, rose about 14 percent from September through November from the same period a year earlier, ship data shows. If unloaded, Valera would be the 19th shipment of LNG into China from a blacklisted Russian plant since August, the data shows. In mid-October, satellite images showed a tanker that loaded at Portovaya transferring fuel into another vessel registered to a Hong Kong-based company near Malaysia. That ship, known as CCH Gas, has been sending out false location signals, and was spotted by satellites near China last month. It isn’t clear where it is currently located. What do you think? We’d love to hear from

Read More »

Key Oil Price Firm Will Ignore Fuel from Russia Crude

One of the world’s main companies for setting benchmark prices of physical commodities said it will start to ignore fuel that’s made from Russian crude when making its assessments. The step by Platts, a unit of S&P Global Energy, effectively eliminates one source of supply that might be cheaper than others. The move will align with European Union rules.  On Nov. 18, Intercontinental Exchange Inc. set out rules that are more restrictive than those of the EU, which allow diesel from a refinery that processes Russian barrels into the bloc, provided the fuel’s from a production line that uses non-Russian oil. By contrast, Platts said that bids and offers that it considers for its assessment process “are expected to carry the implicit guarantee that the oil product will satisfy the EU’s import ban.”  Platts’s two key types of price assessment are cargoes and barge loads of fuel.  Cargo assessments will cease reflecting products made from Russian crude from Dec. 15, the pricing agency said in a statement. Barge prices will stop doing so from Jan. 2. The new EU measures taking effect next year will ban imports of fuels made with Russian crude as part of efforts to cripple revenues that help fund the Kremlin’s war in Ukraine. WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed.

Read More »

No Hurricanes Strike USA For 1st Time in a Decade

For the first time in a decade, not a single hurricane struck the U.S. this season, and that was a much needed break. That’s what Neil Jacobs, Under Secretary of Commerce for Oceans and Atmosphere, and National Oceanic and Atmospheric Administration (NOAA) Administrator, said in a statement posted on NOAA’s site recently, which summarized the Atlantic, Eastern Pacific, and Central Pacific hurricane seasons. “Still, a tropical storm caused damage and casualties in the Carolinas, distant hurricanes created rough ocean waters that caused property damage along the East Coast, and neighboring countries experienced direct hits from hurricanes,” Jacobs said in the statement. The NOAA statement noted that the Atlantic basin produced 13 named storms. Of these, five became hurricanes, including four major hurricanes, NOAA highlighted, pointing out that an average season has 14 named storms, seven hurricanes, and three major hurricanes. In the statement, NOAA said, overall, the season fell within the predicted ranges for named storms, hurricanes, and major hurricanes issued in NOAA’s seasonal outlooks. Hurricane season activity was near-normal for both the Eastern Pacific basin and Central Pacific basin and fell within predicted ranges, respectively, NOAA added in the statement. The organization highlighted that the Eastern Pacific basin hurricane season produced 18 named storms, “with nine becoming hurricanes and three intensifying to major hurricane status”. “Two named storms formed in the Central Pacific basin, with one, Iona, becoming a major hurricane well south of Hawaii,” NOAA added. “Eastern Pacific storms Henriette and Kiko were also hurricanes in the Central Pacific that passed northeast of Hawaii with little impact to the state,” it continued. AI Guidance In the NOAA statement, Jacobs said “the 2025 season was the first year NOAA’s National Hurricane Center incorporated Artificial Intelligence model guidance into their forecasts”. “The NHC [National Hurricane Center] performed exceedingly well when it came to forecasting rapid intensification for

Read More »

What does Arm need to do to gain enterprise acceptance?

But in 2017, AMD released the Zen architecture, which was equal if not superior to the Intel architecture. Zen made AMD competitive, and it fueled an explosive rebirth for a company that was near death a few years prior. AMD now has about 30% market share, while Intel suffers from a loss of technology as well as corporate leadership. Now, customers have a choice of Intel or AMD, and they don’t have to worry about porting their applications to a new platform like they would have to do if they switched to Arm. Analysts weigh in on Arm Tim Crawford sees no demand for Arm in the data center. Crawford is president of AVOA, a CIO consultancy. In his role, he talks to IT professionals all the time, but he’s not hearing much interest in Arm. “I don’t see Arm really making a dent, ever, into the general-purpose processor space,” Crawford said. “I think the opportunity for Arm is special applications and special silicon. If you look at the major cloud providers, their custom silicon is specifically built to do training or optimized to do inference. Arm is kind of in the same situation in the sense that it has to be optimized.” “The problem [for Arm] is that there’s not necessarily a need to fulfill at this point in time,” said Rob Enderle, principal analyst with The Enderle Group. “Obviously, there’s always room for other solutions, but Arm is still going to face the challenge of software compatibility.” And therein lies what may be Arm’s greatest challenge: software compatibility. Software doesn’t care (usually) if it’s on Intel or AMD, because both use the x86 architecture, with some differences in extensions. But Arm is a whole new platform, and that requires porting and testing. Enterprises generally don’t like disruption —

Read More »

Intel decides to keep networking business after all

That doesn’t explain why Intel made the decision to pursue spin-off in the first place. In July, NEX chief Sachin Katti issued a memo that outlined plans to establish key elements of the Networking and Communications business as a stand-alone company. It looked like a done deal, experts said. Jim Hines, research director for enabling technologies and semiconductors at IDC, declined to speculate on whether Intel could get a decent offer but noted NEX is losing ground. IDC estimates Intel’s market share in overall semiconductors at 6.8% in Q3 2025, which is down from 7.4% for the full year 2024 and 9.2% for the full year 2023. Intel’s course reversal “is a positive for Intel in the long term, and recent improvements in its financial situation may have contributed to the decision to keep NEX in house,” he said. When Tan took over as CEO earlier this year, prioritized strengthening the balance sheet and bringing a greater focus on execution. Divest NEX was aligned with these priorities, but since then, Intel has secured investments from the US Government, Nvidia and SoftBank that have reduced the need to raise cash through other means, Hines notes. “The NEX business will prove to be a strategic asset for Intel as it looks to protect and expand its position in the AI datacenter market. Success in this market now requires processor suppliers to offer a full-stack solution, not just silicon. Scale-up and scale-out networking solutions are a key piece of the package, and Intel will be able to leverage its NEX technologies and software, including silicon photonics, to develop differentiated product offerings in this space,” Hines said.

Read More »

At the Crossroads of AI and the Edge: Inside 1623 Farnam’s Rising Role as a Midwest Interconnection Powerhouse

That was the thread that carried through our recent conversation for the DCF Show podcast, where Severn walked through the role Farnam now plays in AI-driven networking, multi-cloud connectivity, and the resurgence of regional interconnection as a core part of U.S. digital infrastructure. Aggregation, Not Proximity: The Practical Edge Severn is clear-eyed about what makes the edge work and what doesn’t. The idea that real content delivery could aggregate at the base of cell towers, he noted, has never been realistic. The traffic simply isn’t there. Content goes where the network already concentrates, and the network concentrates where carriers, broadband providers, cloud onramps, and CDNs have amassed critical mass. In Farnam’s case, that density has grown steadily since the building changed hands in 2018. At the time an “underappreciated asset,” the facility has since become a meeting point for more than 40 broadband providers and over 60 carriers, with major content operators and hyperscale platforms routing traffic directly through its MMRs. That aggregation effect feeds on itself; as more carrier and content traffic converges, more participants anchor themselves to the hub, increasing its gravitational pull. Geography only reinforces that position. Located on the 41st parallel, the building sits at the historical shortest-distance path for early transcontinental fiber routes. It also lies at the crossroads of major east–west and north–south paths that have made Omaha a natural meeting point for backhaul routes and hyperscale expansions across the Midwest. AI and the New Interconnection Economy Perhaps the clearest sign of Farnam’s changing role is the sheer volume of fiber entering the building. More than 5,000 new strands are being brought into the property, with another 5,000 strands being added internally within the Meet-Me Rooms in 2025 alone. These are not incremental upgrades—they are hyperscale-grade expansions driven by the demands of AI traffic,

Read More »

Schneider Electric’s $2.3 Billion in AI Power and Cooling Deals Sends Message to Data Center Sector

When Schneider Electric emerged from its 2025 North American Innovation Summit in Las Vegas last week with nearly $2.3 billion in fresh U.S. data center commitments, it didn’t just notch a big sales win. It arguably put a stake in the ground about who controls the AI power-and-cooling stack over the rest of this decade. Within a single news cycle, Schneider announced: Together, the deals total about $2.27 billion in U.S. data center infrastructure, a number Schneider confirmed in background with multiple outlets and which Reuters highlighted as a bellwether for AI-driven demand.  For the AI data center ecosystem, these contracts function like early-stage fuel supply deals for the power and cooling systems that underpin the “AI factory.” Supply Capacity Agreements: Locking in the AI Supply Chain Significantly, both deals are structured as supply capacity agreements, not traditional one-off equipment purchase orders. Under the SCA model, Schneider is committing dedicated manufacturing lines and inventory to these customers, guaranteeing output of power and cooling systems over a multi-year horizon. In return, Switch and Digital Realty are providing Schneider with forecastable volume and visibility at the scale of gigawatt-class campus build-outs.  A Schneider spokesperson told Reuters that the two contracts are phased across 2025 and 2026, underscoring that this arrangement is about pipeline, as opposed to a one-time backlog spike.  That structure does three important things for the market: Signals confidence that AI demand is durable.You don’t ring-fence billions of dollars of factory output for two customers unless you’re highly confident the AI load curve runs beyond the current GPU cycle. Pre-allocates power & cooling the way the industry pre-allocated GPUs.Hyperscalers and neoclouds have already spent two years locking up Nvidia and AMD capacity. These SCAs suggest power trains and thermal systems are joining chips on the list of constrained strategic resources.

Read More »

The Data Center Power Squeeze: Mapping the Real Limits of AI-Scale Growth

As we all know, the data center industry is at a crossroads. As artificial intelligence reshapes the already insatiable digital landscape, the demand for computing power is surging at a pace that outstrips the growth of the US electric grid. As engines of the AI economy, an estimated 1,000 new data centers1 are needed to process, store, and analyze the vast datasets that run everything from generative models to autonomous systems. But this transformation comes with a steep price and the new defining criteria for real estate: power. Our appetite for electricity is now the single greatest constraint on our expansion, threatening to stall the very innovation we enable. In 2024, US data centers consumed roughly 4% of the nation’s total electricity, a figure that is projected to triple by 2030, reaching 12% or more.2 For AI-driven hyperscale facilities, the numbers are even more staggering. With the largest planned data centers requiring gigawatts of power, enough to supply entire cities, the cumulative demand from all data centers is expected to reach 134 gigawatts by 2030, nearly three times the current load.​3 This presents a systemic challenge. The U.S. power grid, built for a different era, is struggling to keep pace. Utilities are reporting record interconnection requests, with some regions seeing demand projections that exceed their total system capacity by fivefold.4 In Virginia and Texas, the epicenters of data center expansion, grid operators are warning of tight supply-demand balances and the risk of blackouts during peak periods.5 The problem is not just the sheer volume of power needed, but the speed at which it must be delivered. Data center operators are racing to secure power for projects that could be online in as little as 18 months, but grid upgrades and new generation can take years, if not decades. The result

Read More »

The Future of Hyperscale: Neoverse Joins NVLink Fusion as SC25 Accelerates Rack-Scale AI Architectures

Neoverse’s Expanding Footprint and the Power-Efficiency Imperative With Neoverse deployments now approaching roughly 50% of all compute shipped into top hyperscalers in 2025 (representing more than a billion Arm cores) and with nation-scale AI campuses such as the Stargate project already anchored on Arm compute, the addition of NVLink Fusion becomes a pivotal extension of the Neoverse roadmap. Partners can now connect custom Arm CPUs to their preferred NVIDIA accelerators across a coherent, high-bandwidth, rack-scale fabric. Arm characterized the shift as a generational inflection point in data-center architecture, noting that “power—not FLOPs—is the bottleneck,” and that future design priorities hinge on maximizing “intelligence per watt.” Ian Buck, vice president and general manager of accelerated computing at NVIDIA, underscored the practical impact: “Folks building their own Arm CPU, or using an Arm IP, can actually have access to NVLink Fusion—be able to connect that Arm CPU to an NVIDIA GPU or to the rest of the NVLink ecosystem—and that’s happening at the racks and scale-up infrastructure.” Despite the expanded design flexibility, this is not being positioned as an open interconnect ecosystem. NVIDIA continues to control the NVLink Fusion fabric, and all connections ultimately run through NVIDIA’s architecture. For data-center planners, the SC25 announcement translates into several concrete implications: 1.   NVIDIA “Grace-style” Racks Without Buying Grace With NVLink Fusion now baked into Neoverse, hyperscalers and sovereign operators can design their own Arm-based control-plane or pre-processing CPUs that attach coherently to NVIDIA GPU domains—such as NVL72 racks or HGX B200/B300 systems—without relying on Grace CPUs. A rack-level architecture might now resemble: Custom Neoverse SoC for ingest, orchestration, agent logic, and pre/post-processing NVLink Fusion fabric Blackwell GPU islands and/or NVLink-attached custom accelerators (Marvell, MediaTek, others) This decouples CPU choice from NVIDIA’s GPU roadmap while retaining the full NVLink fabric. In practice, it also opens

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »