Stay Ahead, Stay ONMINE

Training Large Language Models: From TRPO to GRPO

Deepseek has recently made quite a buzz in the AI community, thanks to its impressive performance at relatively low costs. I think this is a perfect opportunity to dive deeper into how Large Language Models (LLMs) are trained. In this article, we will focus on the Reinforcement Learning (RL) side of things: we will cover […]

Deepseek has recently made quite a buzz in the AI community, thanks to its impressive performance at relatively low costs. I think this is a perfect opportunity to dive deeper into how Large Language Models (LLMs) are trained. In this article, we will focus on the Reinforcement Learning (RL) side of things: we will cover TRPO, PPO, and, more recently, GRPO (don’t worry, I will explain all these terms soon!) 

I have aimed to keep this article relatively easy to read and accessible, by minimizing the math, so you won’t need a deep Reinforcement Learning background to follow along. However, I will assume that you have some familiarity with Machine Learning, Deep Learning, and a basic understanding of how LLMs work.

I hope you enjoy the article!

The 3 steps of LLM training

The 3 steps of LLM training [1]

Before diving into RL specifics, let’s briefly recap the three main stages of training a Large Language Model:

  • Pre-training: the model is trained on a massive dataset to predict the next token in a sequence based on preceding tokens.
  • Supervised Fine-Tuning (SFT): the model is then fine-tuned on more targeted data and aligned with specific instructions.
  • Reinforcement Learning (often called RLHF for Reinforcement Learning with Human Feedback): this is the focus of this article. The main goal is to further refine responses’ alignments with human preferences, by allowing the model to learn directly from feedback.

Reinforcement Learning Basics

A robot trying to exit a maze! [2]

Before diving deeper, let’s briefly revisit the core ideas behind Reinforcement Learning.

RL is quite straightforward to understand at a high level: an agent interacts with an environment. The agent resides in a specific state within the environment and can take actions to transition to other states. Each action yields a reward from the environment: this is how the environment provides feedback that guides the agent’s future actions. 

Consider the following example: a robot (the agent) navigates (and tries to exit) a maze (the environment).

  • The state is the current situation of the environment (the robot’s position in the maze).
  • The robot can take different actions: for example, it can move forward, turn left, or turn right.
  • Successfully navigating towards the exit yields a positive reward, while hitting a wall or getting stuck in the maze results in negative rewards.

Easy! Now, let’s now make an analogy to how RL is used in the context of LLMs.

RL in the context of LLMs

Simplified RLHF Process [3]

When used during LLM training, RL is defined by the following components:

  • The LLM itself is the agent
  • Environment: everything external to the LLM, including user prompts, feedback systems, and other contextual information. This is basically the framework the LLM is interacting with during training.
  • Actions: these are responses to a query from the model. More specifically: these are the tokens that the LLM decides to generate in response to a query.
  • State: the current query being answered along with tokens the LLM has generated so far (i.e., the partial responses).
  • Rewards: this is a bit more tricky here: unlike the maze example above, there is usually no binary reward. In the context of LLMs, rewards usually come from a separate reward model, which outputs a score for each (query, response) pair. This model is trained from human-annotated data (hence “RLHF”) where annotators rank different responses. The goal is for higher-quality responses to receive higher rewards.

Note: in some cases, rewards can actually get simpler. For example, in DeepSeekMath, rule-based approaches can be used because math responses tend to be more deterministic (correct or wrong answer)

Policy is the final concept we need for now. In RL terms, a policy is simply the strategy for deciding which action to take. In the case of an LLM, the policy outputs a probability distribution over possible tokens at each step: in short, this is what the model uses to sample the next token to generate. Concretely, the policy is determined by the model’s parameters (weights). During RL training, we adjust these parameters so the LLM becomes more likely to produce “better” tokens— that is, tokens that produce higher reward scores.

We often write the policy as:

where a is the action (a token to generate), s the state (the query and tokens generated so far), and θ (model’s parameters).

This idea of finding the best policy is the whole point of RL! Since we don’t have labeled data (like we do in supervised learning) we use rewards to adjust our policy to take better actions. (In LLM terms: we adjust the parameters of our LLM to generate better tokens.)

TRPO (Trust Region Policy Optimization)

An analogy with supervised learning

Let’s take a quick step back to how supervised learning typically works. you have labeled data and use a loss function (like cross-entropy) to measure how close your model’s predictions are to the true labels.

We can then use algorithms like backpropagation and gradient descent to minimize our loss function and update the weights θ of our model.

Recall that our policy also outputs probabilities! In that sense, it is analogous to the model’s predictions in supervised learning… We are tempted to write something like:

where s is the current state and a is a possible action.

A(s, a) is called the advantage function and measures how good is the chosen action in the current state, compared to a baseline. This is very much like the notion of labels in supervised learning but derived from rewards instead of explicit labeling. To simplify, we can write the advantage as:

In practice, the baseline is calculated using a value function. This is a common term in RL that I will explain later. What you need to know for now is that it measures the expected reward we would receive if we continue following the current policy from the state s.

What is TRPO?

TRPO (Trust Region Policy Optimization) builds on this idea of using the advantage function but adds a critical ingredient for stability: it constrains how far the new policy can deviate from the old policy at each update step (similar to what we do with batch gradient descent for example).

  • It introduces a KL divergence term (see it as a measure of similarity) between the current and the old policy:
  • It also divides the policy by the old policy. This ratio, multiplied by the advantage function, gives us a sense of how beneficial each update is relative to the old policy.

Putting it all together, TRPO tries to maximize a surrogate objective (which involves the advantage and the policy ratio) subject to a KL divergence constraint.

PPO (Proximal Policy Optimization)

While TRPO was a significant advancement, it’s no longer used widely in practice, especially for training LLMs, due to its computationally intensive gradient calculations.

Instead, PPO is now the preferred approach in most LLMs architecture, including ChatGPT, Gemini, and more.

It is actually quite similar to TRPO, but instead of enforcing a hard constraint on the KL divergence, PPO introduces a “clipped surrogate objective” that implicitly restricts policy updates, and greatly simplifies the optimization process.

Here is a breakdown of the PPO objective function we maximize to tweak our model’s parameters.

Image by the Author

GRPO (Group Relative Policy Optimization)

How is the value function usually obtained?

Let’s first talk more about the advantage and the value functions I introduced earlier.

In typical setups (like PPO), a value model is trained alongside the policy. Its goal is to predict the value of each action we take (each token generated by the model), using the rewards we obtain (remember that the value should represent the expected cumulative reward).

Here is how it works in practice. Take the query “What is 2+2?” as an example. Our model outputs “2+2 is 4” and receives a reward of 0.8 for that response. We then go backward and attribute discounted rewards to each prefix:

  • “2+2 is 4” gets a value of 0.8
  • “2+2 is” (1 token backward) gets a value of 0.8γ
  • “2+2” (2 tokens backward) gets a value of 0.8γ²
  • etc.

where γ is the discount factor (0.9 for example). We then use these prefixes and associated values to train the value model.

Important note: the value model and the reward model are two different things. The reward model is trained before the RL process and uses pairs of (query, response) and human ranking. The value model is trained concurrently to the policy, and aims at predicting the future expected reward at each step of the generation process.

What’s new in GRPO

Even if in practice, the reward model is often derived from the policy (training only the “head”), we still end up maintaining many models and handling multiple training procedures (policy, reward, value model). GRPO streamlines this by introducing a more efficient method.

Remember what I said earlier?

In PPO, we decided to use our value function as the baseline. GRPO chooses something else: Here is what GRPO does: concretely, for each query, GRPO generates a group of responses (group of size G) and uses their rewards to calculate each response’s advantage as a z-score:

where rᵢ is the reward of the i-th response and μ and σ are the mean and standard deviation of rewards in that group.

This naturally eliminates the need for a separate value model. This idea makes a lot of sense when you think about it! It aligns with the value function we introduced before and also measures, in a sense, an “expected” reward we can obtain. Also, this new method is well adapted to our problem because LLMs can easily generate multiple non-deterministic outputs by using a low temperature (controls the randomness of tokens generation).

This is the main idea behind GRPO: getting rid of the value model.

Finally, GRPO adds a KL divergence term (to be exact, GRPO uses a simple approximation of the KL divergence to improve the algorithm further) directly into its objective, comparing the current policy to a reference policy (often the post-SFT model).

See the final formulation below:

Image by the Author

And… that’s mostly it for GRPO! I hope this gives you a clear overview of the process: it still relies on the same foundational ideas as TRPO and PPO but introduces additional improvements to make training more efficient, faster, and cheaper — key factors behind DeepSeek’s success.

Conclusion

Reinforcement Learning has become a cornerstone for training today’s Large Language Models, particularly through PPO, and more recently GRPO. Each method rests on the same RL fundamentals — states, actions, rewards, and policies — but adds its own twist to balance stability, efficiency, and human alignment:

TRPO introduced strict policy constraints via KL divergence

PPO eased those constraints with a clipped objective

GRPO took an extra step by removing the value model requirement and using group-based reward normalization. Of course, DeepSeek also benefits from other innovations, like high-quality data and other training strategies, but that is for another time!

I hope this article gave you a clearer picture of how these methods connect and evolve. I believe that Reinforcement Learning will become the main focus in training LLMs to improve their performance, surpassing pre-training and SFT in driving future innovations. 

If you’re interested in diving deeper, feel free to check out the references below or explore my previous posts.

Thanks for reading, and feel free to leave a clap and a comment!


Want to learn more about Transformers or dive into the math behind the Curse of Dimensionality? Check out my previous articles:

Transformers: How Do They Transform Your Data?
Diving into the Transformers architecture and what makes them unbeatable at language taskstowardsdatascience.com

The Math Behind “The Curse of Dimensionality”
Dive into the “Curse of Dimensionality” concept and understand the math behind all the surprising phenomena that arise…towardsdatascience.com



References:

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Four new vulnerabilities found in Ingress NGINX

NGINX is a reverse proxy/load balancer that generally acts as the front-end web traffic receiver and directs it to the application service for data transformation. Ingress NGINX is a version used in Kubernetes as the controller for traffic coming into the infrastructure. It takes care of mapping traffic to pods

Read More »

Phillips 66 to Cut Nearly 300 Jobs as LA Refinery Shuts

Phillips 66 will lay off around half of its employees at its sole remaining oil refinery in California after shuttering operations. The Houston-based company said it will cut 122 employees effective April 3 at two facilities in Carson and Wilmington that make up the company’s Los Angeles refinery, according a notice filed Monday with California’s employment regulator. This follows a separate notice last month that 155 employees will be terminated at the refinery in December, bringing the total to 277. The century-old refinery employs about 600 staff, according to Phillips 66’s website. The fuel-making plant has been slated to close since 2024 and the facility, once capable of processing 139,000 barrels of oil a day, refined its final barrel of crude in late 2025. Another Texas-based refiner, Valero Energy Corp., is also cutting more than 200 jobs in California this year as it idles a San Francisco Bay Area plant. Oil companies have decried what they call a hostile regulatory environment in the state, whose residents regularly pay the highest gasoline prices in the nation. Chevron Corp. officially relocated its headquarters to Texas in recent years and refiners have either fled or converted plants to producing biofuels, dwindling the in-state supply of petroleum products like gasoline, diesel and jet fuel. Some state lawmakers have recently tried to soften their stance toward the oil and gas industry. Phillips 66 continues to operate a biofuels refinery near San Francisco and import fossil fuels to California. WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed.

Read More »

WTI, Brent Gain as Talks Ease Conflict Fears

Oil edged marginally higher after a choppy session as investors assessed the status of nuclear talks between the US and Iran. West Texas Intermediate settled above $63 a barrel, with markets reacting sharply to headlines tied to the meeting. Iranian Foreign Minister Abbas Araghchi said the talks had a “good start,” even as the Wall Street Journal reported that Tehran stood by its refusal to end enrichment of nuclear fuel, a major sticking point for the US. The escalation in the Middle East, which provides about a third of the world’s crude, has added a risk premium to benchmark oil prices. Traders have weighed the geopolitical tensions against an outlook for oversupply. Still, futures in New York notched their first weekly retreat since mid-December as the US-Iran talks helped allay concerns over a broader conflict in the region. Prices also extended gains after data showed US consumer sentiment unexpectedly improved to the highest in six months, calming some concerns over an economic slowdown in the country that could lead to weaker oil demand. Meanwhile, in trilateral negotiations with the US, Ukraine and Russia agreed to exchange prisoners for the first time in five months as they sought to end their four-year conflict. Talks were making progress, with results expected “in the coming weeks,” President Donald Trump’s special envoy said. Saudi Arabia cut prices for buyers in Asia by less than expected, signaling confidence in demand for its barrels, although prices have still been reduced to the lowest levels since late 2020. Oil Prices WTI for March delivery settled 0.4% higher at $63.55 a barrel in New York. Brent for April settlement rose 0.7% to close at $68.05 a barrel. What do you think? We’d love to hear from you, join the conversation on the Rigzone Energy Network. The Rigzone Energy

Read More »

Saudis Cut Key Oil Price for Asian Buyers

Saudi Arabia cut the price of its main oil grade for buyers in Asia to the lowest in years, a further sign that global supplies are running ahead of demand. State oil producer Saudi Aramco will reduce the price of its Arab Light grade by 30 cents a barrel to parity with the regional benchmark for March, according to a price list seen by Bloomberg. That brings pricing for the kingdom’s most plentiful crude blend to the lowest level since late 2020. Still, Aramco’s cut was not as deep as buyers expected, coming in smaller than even the most modest estimate of a reduction in a survey of refiners and traders. That offers a sign that the kingdom has faith in demand for its barrels and Aramco’s Chief Executive Officer Amin Nasser has previously said that fears of a glut are overblown. Saudi Arabia’s monthly crude pricing is keenly watched by traders across the globe as it sets the tone for other sellers in the world’s top producing regions. Asia is the biggest market for Middle Eastern crude, with the prices set for refiners determining the profitability of processing and influencing the cost of fuels like gasoline and diesel the world over. Aramco also cut pricing for its Arab Medium and Arab Heavy crude grades to Asia to the lowest levels since mid 2020, while it increased prices for the Extra Light and Super Light blends. That split reflects that dynamic in the Middle East market where prices for the heavier and more sulfurous crudes that are most plentiful in the region have trailed those for the lighter blends. The OPEC+ producers group, led by Saudi Arabia and Russia, agreed to keep production levels steady during talks on Feb. 1, maintaining an earlier decision to forgo output increases to avoid

Read More »

Shell to Pause Kazakh Oil and Gas Investments

Shell Plc will pause investment in Kazakhstan as it navigates legal claims from the OPEC+ nation against oil majors that could stretch into the billions of dollars, Chief Executive Officer Wael Sawan said. Kazakhstan is pressing multiple western oil companies for compensation across a series of cases both in the Central Asian country’s courts and in international arbitration. This month, it emerged that Shell and partners lost a dispute that could see them pay as much as $4 billion. There is also ongoing litigation about sulfur breaches and project costs. “It does impact our appetite to invest further in Kazakhstan,” Sawan said Thursday during an earnings conference call with analysts. While the company sees plenty of investment opportunities in the future, “we will hold until we have a better line of sight to where things end up.” The setbacks in Kazakhstan come as Shell seeks to ensure future production growth with a healthy inventory of projects. Acquisitions have largely filled the company’s production gap through 2030, buying time to deal with the 2030-2035 period, Sawan said in an interview on Thursday. The Kazakh energy ministry didn’t reply to an emailed request for comment sent outside normal working hours. Sawan didn’t elaborate on whether the pause would apply to new or existing projects. Shell didn’t immediately respond to a request to clarify whether the CEO was talking about new or existing investments. The latest dispute was against the Karachaganak field joint venture led by Italy’s Eni SpA and Shell, over cost deductions. Other partners include Chevron Corp., Lukoil PJSC and KazMunayGas National Co. The venture may still appeal the decision.   Last year, the companies proposed settling the dispute by building a plant that would process natural gas from the field for domestic use. WHAT DO YOU THINK? Generated by readers,

Read More »

Tankers With Russian Oil Flock to East Asia

More than a dozen tankers loaded with Russian Urals oil are sailing toward Asia or idling along the route, a sign of producers racing to get cargoes closer to China as India pulls back from the trade.  These vessels — carrying a combined 10 million to 12 million barrels of oil — are spread across the Indian Ocean, and off the coasts of Malaysia, China and Russia. Five of them are indicating ‘for orders’ or ‘China for orders’ as their status, according to data intelligence firm Kpler, a category that usually means they don’t yet have a specific buyer or discharge port. Another six are signaling Singapore and Malaysia, and are likely heading to a popular spot for ship-to-ship transfers in the South China Sea where they can wait until the crude is bought. Four are floating off Malaysia, China and Russia’s Far East, without indicating a clear destination. Urals — Russia’s flagship crude grade, which is loaded from ports in the Baltic Sea — has become the variety of choice for Indian refiners since the invasion of Ukraine in early 2022 saw it become heavily discounted. But pressure from Washington has pushed imports lower, reaching an average of 1.2 million barrels a day in January compared with a peak of more than 2 million barrels a day in mid-2024. Indian imports of the crude could be trimmed further after President Donald Trump said on Monday the country would stop buying Russian oil as part of deal to cut trade tariffs. Prime Minister Narendra Modi confirmed the agreement but didn’t comment on oil. Some refiners are holding off purchases while they seek clarification from New Delhi.  The big question is where the surplus cargoes of Urals — the bulk of which have gone to India over the last few years — will now end up. China’s

Read More »

BP, KOC Sign ETSA Extension

In a statement sent to Rigzone on Thursday, BP announced that it and Kuwait Oil Company have signed an extension of the Enhanced Technical Services Agreement (ETSA) between the companies. The agreement “paves the way for both companies to collaboratively progress Kuwait’s most strategic asset fields”, BP noted in the statement. BP added that the deal enables it to “bring expertise in enhanced oil recovery to the Greater Burgan oil field and develop local capabilities with Kuwait Oil Company to manage the development of South and East Kuwait fields through 50 secondment opportunities of BP’s technical experts”. Rigzone asked BP to disclose the deal’s value. A BP spokesperson was unable to do so. The ETSA was originally signed in 2016 for a period of 10 years, the statement highlighted, adding that it will now extend through to March 2029. BP Executive Vice President, Gas & Low Carbon Energy, William Lin, noted in the statement, “BP’s commitment to Kuwait dates back to our participation in the discovery of the Greater Burgan oil field in the 1930s, and we appreciate the trust placed in our expertise in giant oil and gas fields to continue to help develop this important strategic asset”. “This is another example of the deep relationships we’ve formed across governments, partners, and supply chains in the regions where we operate. We look forward to continuing our strong collaboration with Kuwait and to working with KOC to help support the country’s long-term energy resilience,” he added. BP notes on its website that it was one of the founders of the original Kuwait Oil Company, which it highlighted first discovered oil at Burgan in 1938. “Exportation of KOC began in 1946, in which the first export of Kuwait crude was loaded on to the bp vessel ‘Fusilier’,” BP’s site adds. BP

Read More »

Nvidia’s $100 Billion OpenAI Bet Shrinks and Signals a New Phase in the AI Infrastructure Cycle

One of the most eye-popping figures of the AI boom – a proposed $100 billion Nvidia commitment to OpenAI and as much as 10 gigawatts of compute for the company’s Stargate AI infrastructure buildout – is no longer on the table. And that partial retreat tells the data center industry something important. According to multiple reports surfacing at the end of January, Nvidia has paused and re-scoped its previously discussed, non-binding investment framework with OpenAI, shifting from an unprecedented capital-plus-infrastructure commitment to a much smaller (though still massive) equity investment. What was once framed as a potential $100 billion alignment is now being discussed in the $20-30 billion range, as part of OpenAI’s broader effort to raise as much as $100 billion at a valuation approaching $830 billion. For data center operators, infrastructure developers, and power providers, the recalibration matters less for the headline number and more for what it reveals about risk discipline, competitive dynamics, and the limits of vertical circularity in AI infrastructure finance. From Moonshot to Measured Capital The original September 2025 memorandum reportedly contemplated not just capital, but direct alignment on compute delivery: a structure that would have tightly coupled Nvidia’s balance sheet with OpenAI’s AI-factory roadmap. By late January, however, sources indicated Nvidia executives had grown uneasy with both the scale and the structure of the deal. Speaking in Taipei on January 31, Nvidia CEO Jensen Huang pushed back on reports of friction, calling them “nonsense” and confirming Nvidia would “absolutely” participate in OpenAI’s current fundraising round. But Huang was also explicit on what had changed: the investment would be “nothing like” $100 billion, even if it ultimately becomes the largest single investment Nvidia has ever made. That nuance matters. Nvidia is not walking away from OpenAI. But it is drawing a clearer boundary around

Read More »

Data Center Jobs: Engineering, Construction, Commissioning, Sales, Field Service and Facility Tech Jobs Available in Major Data Center Hotspots

Each month Data Center Frontier, in partnership with Pkaza, posts some of the hottest data center career opportunities in the market. Here’s a look at some of the latest data center jobs posted on the Data Center Frontier jobs board, powered by Pkaza Critical Facilities Recruiting. Looking for Data Center Candidates? Check out Pkaza’s Active Candidate / Featured Candidate Hotlist Onsite Engineer – Critical FacilitiesCharleston, SC This is NOT a traveling position. Having degreed engineers seems to be all the rage these days. I can also use this type of candidate in following cities: Ashburn, VA; Moncks Corner, SC; Binghamton, NY; Dallas, TX or Indianapolis, IN. Our client is an engineering design and commissioning company that is a subject matter expert in the data center space. This role will be onsite at a customer’s data center. They will provide onsite design coordination and construction administration, consulting and management support for the data center / mission critical facilities space with the mindset to provide reliability, energy efficiency, sustainable design and LEED expertise when providing these consulting services for enterprise, colocation and hyperscale companies. This career-growth minded opportunity offers exciting projects with leading-edge technology and innovation as well as competitive salaries and benefits. Electrical Commissioning Engineer Ashburn, VA This traveling position is also available in: New York, NY; White Plains, NY;  Richmond, VA; Montvale, NJ; Charlotte, NC; Atlanta, GA; Hampton, GA; New Albany, OH; Cedar Rapids, IA; Phoenix, AZ; Salt Lake City, UT; Dallas, TX; Kansas City, MO; Omaha, NE; Chesterton, IN or Chicago, IL. *** ALSO looking for a LEAD EE and ME CxA Agents and CxA PMs *** Our client is an engineering design and commissioning company that has a national footprint and specializes in MEP critical facilities design. They provide design, commissioning, consulting and management expertise in the critical facilities space. They

Read More »

Operationalizing AI at Scale: Google Cloud on Data Infrastructure, Search, and Enterprise AI

The AI conversation has been dominated by model announcements, benchmark races, and the rapid evolution of large language models. But in enterprise environments, the harder problem isn’t building smarter models. It’s making them work reliably with real-world data. On the latest episode of the Data Center Frontier Show Podcast, Sailesh Krishnamurthy, VP of Engineering for Databases at Google Cloud, pulled back the curtain on the infrastructure layer where many ambitious AI initiatives succeed, or quietly fail. Krishnamurthy operates at the intersection of databases, search, and AI systems. His perspective underscores a growing reality across enterprise IT: AI success increasingly depends on how organizations manage, integrate, and govern data across operational systems, not just how powerful their models are. The Disconnect Between LLMs and Reality Enterprises today face a fundamental challenge: connecting LLMs to real-time operational data. Search systems handle documents and unstructured information well. Operational databases manage transactions, customer data, and financial records with precision. But combining the two remains difficult. Krishnamurthy described the problem as universal. “Inside enterprises, knowledge workers are often searching documents while separately querying operational systems,” he said. “But combining unstructured information with operational database data is still hard to do.” Externally, customers encounter the opposite issue. Portals expose personal data but struggle to incorporate broader contextual information. “You get a narrow view of your own data,” he explained, “but combining that with unstructured information that might answer your real question is still challenging.” The result: AI systems often operate with incomplete context. Vector Search Moves Into the Database Vector search has emerged as a bridge between structured and unstructured worlds. But its evolution over the past three years has changed how enterprises deploy it. Early use cases focused on semantic search, i.e. finding meaning rather than exact keyword matches. Bug tracking systems, for example, began

Read More »

Transmission at the Breaking Point: Why the Grid Is Becoming the Defining Constraint for AI Data Centers

Regions in a Position to Scale California (A- overall)California continues to lead in long-term, scenario-based transmission planning. CAISO’s most recent transmission plan identifies $4.8 billion in new projects to accommodate approximately 76 gigawatts of additional capacity by 2039, explicitly accounting for data center growth alongside broader electrification. For data center developers, California’s challenge is less about planning quality and more about execution. Permitting timelines, cost allocation debates, and political scrutiny remain significant hurdles. Plains / Southwest Power Pool (B- overall, A in regional planning)SPP stands out nationally for embracing ultra-high-voltage transmission as a backbone strategy. Its recent Integrated Transmission Plans approve more than $16 billion in new projects, including multiple 765-kV lines, with benefit-cost ratios exceeding 10:1. This approach positions the Plains region as one of the most structurally “AI-ready” grids in North America, particularly for multi-gigawatt campuses supported by wind, natural gas, and emerging nuclear resources. Midwest / MISO (B overall)MISO’s Long-Range Transmission Planning framework aligns closely with federal best practices, co-optimizing generation and transmission over long planning horizons. While challenges remain—particularly around interregional coordination—the Midwest is comparatively well positioned for sustained data center growth. Regions Facing Heightened Risk Texas / ERCOT (D- overall)Texas has approved massive new transmission investments, including 765-kV projects tied to explosive load growth in the Permian Basin. However, the report criticizes ERCOT’s planning for remaining largely siloed and reliability-driven, with limited long-term scenario analysis and narrow benefit assessments. For data centers, ERCOT still offers speed to market, but increasingly with risks tied to congestion, price volatility, and political backlash surrounding grid reliability. Southeast (F overall)The Southeast receives failing grades across all categories, with transmission development remaining fragmented, utility-driven, and largely disconnected from durable regional planning frameworks. As AI data centers increasingly target the region for its land availability and tax incentives, the lack of

Read More »

From Row-Level CDUs to Facility-Scale Cooling: DCX Ramps Liquid Cooling for the AI Factory Era

Enter the 8MW CDU Era The next evolution arrived just days later. On Jan. 20, DCX announced its second-generation facility-scale unit, the FDU V2AT2, pushing capacity into territory previously unimaginable for single CDU platforms. The system delivers up to 8.15 megawatts of heat transfer capacity with record flow rates designed to support 45°C warm-water cooling, aligning directly with NVIDIA’s roadmap for rack-scale AI systems, including Vera Rubin-class deployments. That temperature target is significant. Warm-water cooling at this level allows many facilities to eliminate traditional chillers for heat rejection, depending on climate and deployment design. Instead of relying on compressor-driven refrigeration, operators can shift toward dry coolers or other simplified heat rejection strategies. The result: • Reduced mechanical complexity• Lower energy consumption• Improved efficiency at scale• New opportunities for heat reuse According to DCX CTO Maciek Szadkowski, the goal is to avoid obsolescence in a single hardware generation: “As the datacenter industry transitions to AI factories, operators need cooling systems that won’t be obsolete in one platform cycle. The FDU V2AT2 replaces multiple legacy CDUs and enables 45°C supply water operation while simplifying cooling topology and significantly reducing both CAPEX and OPEX.” The unit incorporates a high-capacity heat exchanger with a 2°C approach temperature, N+1 redundant pump configuration, integrated water quality control, and diagnostics systems designed for predictive maintenance. In short, this is infrastructure built not for incremental density growth, but for hyperscale AI facilities where megawatts of cooling must scale as predictably as compute capacity. Liquid Cooling Becomes System Architecture The broader industry implication is clear: cooling is no longer an auxiliary mechanical function. It is becoming system architecture. DCX’s broader 2025 performance metrics underscore the speed of this transition. The company reported 600% revenue growth, expanded its workforce fourfold, and shipped or secured contracts covering more than 500 MW

Read More »

AI Infrastructure Scales Out and Up: Edge Expansion Meets the Gigawatt Campus Era

The AI infrastructure boom is often framed around massive hyperscale campuses racing to secure gigawatts of power. But an equally important shift is happening in parallel: AI infrastructure is also becoming more distributed, modular, and sovereign, extending compute far beyond traditional data center hubs. A wave of recent announcements across developers, infrastructure investors, and regional operators shows the market pursuing a dual strategy. On one end, developers are accelerating delivery of hyperscale campuses measured in hundreds of megawatts, and increasingly gigawatts, often located where power availability and energy economics offer structural advantage, and in some cases pairing compute directly with dedicated generation. On the other, providers are building increasingly capable regional and edge facilities designed to bring AI compute closer to users, industrial operations, and national jurisdictions. Taken together, these moves point toward a future in which AI infrastructure is no longer purely centralized, but built around interconnected hub-and-spoke architectures combining energy-advantaged hyperscale cores with rapidly deployable edge capacity. Recent developments across hyperscale developers, edge specialists, infrastructure investors, and regional operators illustrate how quickly this model is taking shape. Sovereign AI Moves Beyond the Core On Feb. 5, 2026, San Francisco-based Armada and European AI infrastructure builder Nscale signed a letter of intent to jointly deploy both large-scale and edge AI infrastructure worldwide. The collaboration targets enterprise and public sector customers seeking sovereign, secure, geographically distributed AI environments. Nscale is building large AI supercomputer clusters globally, offering vertically integrated capabilities spanning power, data centers, compute, and software. Armada specializes in modular deployments through its Galleon data centers and Armada Edge Platform, delivering compute and storage into remote or infrastructure-poor environments. The combined offering addresses a growing challenge: many governments and enterprises want AI capability deployed within their own jurisdictions, even where traditional hyperscale infrastructure does not yet exist. “There is

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »