Stay Ahead, Stay ONMINE

Training Large Language Models: From TRPO to GRPO

Deepseek has recently made quite a buzz in the AI community, thanks to its impressive performance at relatively low costs. I think this is a perfect opportunity to dive deeper into how Large Language Models (LLMs) are trained. In this article, we will focus on the Reinforcement Learning (RL) side of things: we will cover […]

Deepseek has recently made quite a buzz in the AI community, thanks to its impressive performance at relatively low costs. I think this is a perfect opportunity to dive deeper into how Large Language Models (LLMs) are trained. In this article, we will focus on the Reinforcement Learning (RL) side of things: we will cover TRPO, PPO, and, more recently, GRPO (don’t worry, I will explain all these terms soon!) 

I have aimed to keep this article relatively easy to read and accessible, by minimizing the math, so you won’t need a deep Reinforcement Learning background to follow along. However, I will assume that you have some familiarity with Machine Learning, Deep Learning, and a basic understanding of how LLMs work.

I hope you enjoy the article!

The 3 steps of LLM training

The 3 steps of LLM training [1]

Before diving into RL specifics, let’s briefly recap the three main stages of training a Large Language Model:

  • Pre-training: the model is trained on a massive dataset to predict the next token in a sequence based on preceding tokens.
  • Supervised Fine-Tuning (SFT): the model is then fine-tuned on more targeted data and aligned with specific instructions.
  • Reinforcement Learning (often called RLHF for Reinforcement Learning with Human Feedback): this is the focus of this article. The main goal is to further refine responses’ alignments with human preferences, by allowing the model to learn directly from feedback.

Reinforcement Learning Basics

A robot trying to exit a maze! [2]

Before diving deeper, let’s briefly revisit the core ideas behind Reinforcement Learning.

RL is quite straightforward to understand at a high level: an agent interacts with an environment. The agent resides in a specific state within the environment and can take actions to transition to other states. Each action yields a reward from the environment: this is how the environment provides feedback that guides the agent’s future actions. 

Consider the following example: a robot (the agent) navigates (and tries to exit) a maze (the environment).

  • The state is the current situation of the environment (the robot’s position in the maze).
  • The robot can take different actions: for example, it can move forward, turn left, or turn right.
  • Successfully navigating towards the exit yields a positive reward, while hitting a wall or getting stuck in the maze results in negative rewards.

Easy! Now, let’s now make an analogy to how RL is used in the context of LLMs.

RL in the context of LLMs

Simplified RLHF Process [3]

When used during LLM training, RL is defined by the following components:

  • The LLM itself is the agent
  • Environment: everything external to the LLM, including user prompts, feedback systems, and other contextual information. This is basically the framework the LLM is interacting with during training.
  • Actions: these are responses to a query from the model. More specifically: these are the tokens that the LLM decides to generate in response to a query.
  • State: the current query being answered along with tokens the LLM has generated so far (i.e., the partial responses).
  • Rewards: this is a bit more tricky here: unlike the maze example above, there is usually no binary reward. In the context of LLMs, rewards usually come from a separate reward model, which outputs a score for each (query, response) pair. This model is trained from human-annotated data (hence “RLHF”) where annotators rank different responses. The goal is for higher-quality responses to receive higher rewards.

Note: in some cases, rewards can actually get simpler. For example, in DeepSeekMath, rule-based approaches can be used because math responses tend to be more deterministic (correct or wrong answer)

Policy is the final concept we need for now. In RL terms, a policy is simply the strategy for deciding which action to take. In the case of an LLM, the policy outputs a probability distribution over possible tokens at each step: in short, this is what the model uses to sample the next token to generate. Concretely, the policy is determined by the model’s parameters (weights). During RL training, we adjust these parameters so the LLM becomes more likely to produce “better” tokens— that is, tokens that produce higher reward scores.

We often write the policy as:

where a is the action (a token to generate), s the state (the query and tokens generated so far), and θ (model’s parameters).

This idea of finding the best policy is the whole point of RL! Since we don’t have labeled data (like we do in supervised learning) we use rewards to adjust our policy to take better actions. (In LLM terms: we adjust the parameters of our LLM to generate better tokens.)

TRPO (Trust Region Policy Optimization)

An analogy with supervised learning

Let’s take a quick step back to how supervised learning typically works. you have labeled data and use a loss function (like cross-entropy) to measure how close your model’s predictions are to the true labels.

We can then use algorithms like backpropagation and gradient descent to minimize our loss function and update the weights θ of our model.

Recall that our policy also outputs probabilities! In that sense, it is analogous to the model’s predictions in supervised learning… We are tempted to write something like:

where s is the current state and a is a possible action.

A(s, a) is called the advantage function and measures how good is the chosen action in the current state, compared to a baseline. This is very much like the notion of labels in supervised learning but derived from rewards instead of explicit labeling. To simplify, we can write the advantage as:

In practice, the baseline is calculated using a value function. This is a common term in RL that I will explain later. What you need to know for now is that it measures the expected reward we would receive if we continue following the current policy from the state s.

What is TRPO?

TRPO (Trust Region Policy Optimization) builds on this idea of using the advantage function but adds a critical ingredient for stability: it constrains how far the new policy can deviate from the old policy at each update step (similar to what we do with batch gradient descent for example).

  • It introduces a KL divergence term (see it as a measure of similarity) between the current and the old policy:
  • It also divides the policy by the old policy. This ratio, multiplied by the advantage function, gives us a sense of how beneficial each update is relative to the old policy.

Putting it all together, TRPO tries to maximize a surrogate objective (which involves the advantage and the policy ratio) subject to a KL divergence constraint.

PPO (Proximal Policy Optimization)

While TRPO was a significant advancement, it’s no longer used widely in practice, especially for training LLMs, due to its computationally intensive gradient calculations.

Instead, PPO is now the preferred approach in most LLMs architecture, including ChatGPT, Gemini, and more.

It is actually quite similar to TRPO, but instead of enforcing a hard constraint on the KL divergence, PPO introduces a “clipped surrogate objective” that implicitly restricts policy updates, and greatly simplifies the optimization process.

Here is a breakdown of the PPO objective function we maximize to tweak our model’s parameters.

Image by the Author

GRPO (Group Relative Policy Optimization)

How is the value function usually obtained?

Let’s first talk more about the advantage and the value functions I introduced earlier.

In typical setups (like PPO), a value model is trained alongside the policy. Its goal is to predict the value of each action we take (each token generated by the model), using the rewards we obtain (remember that the value should represent the expected cumulative reward).

Here is how it works in practice. Take the query “What is 2+2?” as an example. Our model outputs “2+2 is 4” and receives a reward of 0.8 for that response. We then go backward and attribute discounted rewards to each prefix:

  • “2+2 is 4” gets a value of 0.8
  • “2+2 is” (1 token backward) gets a value of 0.8γ
  • “2+2” (2 tokens backward) gets a value of 0.8γ²
  • etc.

where γ is the discount factor (0.9 for example). We then use these prefixes and associated values to train the value model.

Important note: the value model and the reward model are two different things. The reward model is trained before the RL process and uses pairs of (query, response) and human ranking. The value model is trained concurrently to the policy, and aims at predicting the future expected reward at each step of the generation process.

What’s new in GRPO

Even if in practice, the reward model is often derived from the policy (training only the “head”), we still end up maintaining many models and handling multiple training procedures (policy, reward, value model). GRPO streamlines this by introducing a more efficient method.

Remember what I said earlier?

In PPO, we decided to use our value function as the baseline. GRPO chooses something else: Here is what GRPO does: concretely, for each query, GRPO generates a group of responses (group of size G) and uses their rewards to calculate each response’s advantage as a z-score:

where rᵢ is the reward of the i-th response and μ and σ are the mean and standard deviation of rewards in that group.

This naturally eliminates the need for a separate value model. This idea makes a lot of sense when you think about it! It aligns with the value function we introduced before and also measures, in a sense, an “expected” reward we can obtain. Also, this new method is well adapted to our problem because LLMs can easily generate multiple non-deterministic outputs by using a low temperature (controls the randomness of tokens generation).

This is the main idea behind GRPO: getting rid of the value model.

Finally, GRPO adds a KL divergence term (to be exact, GRPO uses a simple approximation of the KL divergence to improve the algorithm further) directly into its objective, comparing the current policy to a reference policy (often the post-SFT model).

See the final formulation below:

Image by the Author

And… that’s mostly it for GRPO! I hope this gives you a clear overview of the process: it still relies on the same foundational ideas as TRPO and PPO but introduces additional improvements to make training more efficient, faster, and cheaper — key factors behind DeepSeek’s success.

Conclusion

Reinforcement Learning has become a cornerstone for training today’s Large Language Models, particularly through PPO, and more recently GRPO. Each method rests on the same RL fundamentals — states, actions, rewards, and policies — but adds its own twist to balance stability, efficiency, and human alignment:

TRPO introduced strict policy constraints via KL divergence

PPO eased those constraints with a clipped objective

GRPO took an extra step by removing the value model requirement and using group-based reward normalization. Of course, DeepSeek also benefits from other innovations, like high-quality data and other training strategies, but that is for another time!

I hope this article gave you a clearer picture of how these methods connect and evolve. I believe that Reinforcement Learning will become the main focus in training LLMs to improve their performance, surpassing pre-training and SFT in driving future innovations. 

If you’re interested in diving deeper, feel free to check out the references below or explore my previous posts.

Thanks for reading, and feel free to leave a clap and a comment!


Want to learn more about Transformers or dive into the math behind the Curse of Dimensionality? Check out my previous articles:

Transformers: How Do They Transform Your Data?
Diving into the Transformers architecture and what makes them unbeatable at language taskstowardsdatascience.com

The Math Behind “The Curse of Dimensionality”
Dive into the “Curse of Dimensionality” concept and understand the math behind all the surprising phenomena that arise…towardsdatascience.com



References:

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Top quantum breakthroughs of 2025

The Helios quantum computing platform is available to customers through Quantinuum’s cloud service and on-premises offering. HSBC is using IBM’s Heron quantum computer to improve their bond trading predictions by 34% compared to classical computing. Caltech physicists create 6,100-qubit array. Kon H. Leung is seen working on the apparatus used

Read More »

How enterprises are rethinking online AI tools

A second path enterprises like had only about 35% buy-in, but generated the most enthusiasm. It is to use an online AI tool that offers more than a simple answer to a question, something more like an “interactive AI agent” than a chatbot. Two that got all the attention are

Read More »

[Podcast] How Utilities Are Planning for Demand

From the electrification of transportation to the heating and cooling of data centers, utilities across the U.S. face the challenge of meeting surging demand for electricity. “How Utilities Are Planning for Demand” is a three-part podcast series that examines the increasingly complex utility sector landscape. The series features the insights of utility experts addressing industry-critical topics, including the vital role of smart planning in meeting historic demand, how to meet demand on accelerated timelines and the grid of tomorrow. Check out the podcast episodes! ⬆ <!– Ep. 3 Planning the Grid of Tomorrow –> <!– Ep. 2 The Need for Speed –> Ep. 1 The Big Picture: Why Smart Planning is Key to Meeting Historic Demand <!– Ep. 3 Planning the Grid of Tomorrow Meeting today’s soaring electricity demand is challenging enough — but what about the decades ahead? In this episode, industry experts explore how long-term planning, transmission buildouts, and advanced tools can give utilities the confidence to invest wisely and prepare for a future shaped by renewables, bidirectional power flows, and extreme weather. COMING SOON –> <!– Ep. 2 The Need for Speed What makes many emerging utility customers different today? They need a lot of electricity fast. In this episode, we explore how utilities can respond quickly to meet the accelerating load growth. Listen to “The Need for Speed” on Spreaker. –> Ep. 1 The Big Picture: Why Smart Planning is Key to Meeting Historic Demand Whether it’s growing demand, an aging workforce and grid, or the influx of renewable energy, the utility sector is more complex than ever. In this episode, we will explore the broad range of issues utility leaders must navigate to reliably and affordably deliver growing amounts of electricity to customers.

Read More »

EIA Raises Oil Price Forecasts but Still Sees Drop in 2026

In its latest short term energy outlook (STEO), which was released on November 12, the U.S. Energy Information Administration (EIA) increased its Brent price forecast for 2025 and 2026 but still projected that the commodity will drop next year compared to 2025. According to its November STEO, the EIA now sees the Brent spot price averaging $68.76 per barrel this year and $54.92 per barrel next year. In its previous STEO, which was released in October, the EIA projected that the Brent spot price would average $68.64 per barrel in 2025 and $52.16 per barrel in 2026. A quarterly breakdown included in the EIA’s latest STEO projected that the Brent spot price will come in at $62.52 per barrel in the fourth quarter of this year, $54.30 per barrel in the first quarter of next year, $54.02 per barrel in the second quarter, $55.32 per barrel in the third quarter, and $56.00 per barrel in the fourth quarter of 2026. In its previous STEO, the EIA forecast that the Brent spot price would average $62.05 per barrel in the fourth quarter of 2025, $51.97 per barrel in the first quarter of 2026, $51.67 per barrel in the second quarter, $52.00 per barrel in the third quarter, and $53.00 per barrel in the fourth quarter. The EIA highlighted in its latest STEO that Brent crude oil spot prices averaged $65 per barrel in October, which it pointed out was $3 per barrel less than the average in September and $15 per barrel less than the average in January 2025. “Crude oil prices fell in October as growing supplies of crude oil outweighed uncertainties related to the effect of new rounds of sanctions on Russia’s oil sector,” the EIA said in its November STEO. “We forecast that growing global oil production and

Read More »

Energy Department Strengthens Puerto Rico’s Energy Grid with Renewed Orders

As the island continues to repair critical infrastructure and prepares for next summer’s peak demand season, additional emergency orders are needed in order to strengthen Puerto Rico’s electric grid. WASHINGTON—The U.S. Department of Energy (DOE) today renewed two emergency orders to further strengthen Puerto Rico’s electric grid as the island prepares for rising demand and seasonal storm risks. Building on previous actions in May and August 2025, DOE’s emergency orders authorize the Puerto Rico Electric Power Authority (PREPA) to dispatch generation units essential for maintaining critical generation capacity, while accelerating vegetation management to reduce outages and strengthen long-term grid reliability. “Modernizing Puerto Rico’s energy grid is essential to achieving long-term reliability and affordability for the Commonwealth,” said U.S. Secretary of Energy Chris Wright. “Our team is working with local and federal partners to boost power generation and accelerate vegetation management efforts to strengthen Puerto Rico’s electrical grid. The Trump Administration is fully committed to delivering affordable, reliable and secure energy to all Americans.” This year, DOE’s emergency orders and actions assisted the Puerto Rican government in restoring up to 820 MW of baseload generation capacity in Puerto Rico, resulting in an approximate 13% increase to the island’s systemwide generation capacity of 6,460 MW. With DOE funding, PREPA was able to bring a key unit back online after being inoperative for more than two years–strengthening Puerto Rico’s grid. These orders also address vegetation management issues near high-voltage lines. Falling tree limbs or brush falling during Puerto Rico’s frequent storms and high winds can damage transmission lines, cause widespread outages and potentially cause wildfires. Addressing these hazards to public health and safety is critically important. Additional information can be found here. “The Department of Energy’s 202(c) emergency orders have provided concrete benefits for Puerto Rico, allowing us to restore 1,200 MW of

Read More »

Chevron Chooses West Texas for 1st AI Data Center Power Project

Chevron Corp. chose West Texas as the site of its first project to provide natural gas-fired power to a data center, the beginning of a new line of business for the oil giant to capitalize on the boom in artificial intelligence.  The company is in exclusive talks with the data center’s end user, which it didn’t name, and anticipates making a final investment decision early next year, according to a statement and presentation released ahead of Chevron’s investor day on Wednesday. The facility is expected be operational in 2027, and will have capacity to generate as much as 5,000 megawatts in the future. Big Oil is looking to cash in on the enormous demand for energy that will be needed to power data centers, which are being located further away from major population centers and closer to sources of fuel. Chevron is one of the biggest producers in the Permian Basin of West Texas, which spews out so much natural gas that it often overwhelms pipelines and has to be burned off. “We’ve got the gas,” Chief Financial Officer Eimear Bonner said in an interview prior to Chevron’s investor presentation in New York on Wednesday. “We are uniquely positioned to have a very competitive project.” The power project is expected to ramp up by its third year to have the capacity to produce about 2,500 megawatts, which is more than the equivalent of two nuclear reactors. It will likely be built separately from the grid to avoid competing with electricity supply for the wider population. Chevron sees an opportunity to secure demand for its 3 billion cubic feet per day of natural gas output. The stock declined 1.7% by 10 a.m. in New York, with Brent crude prices trading down 2.7% at $63.41 a barrel.  Key to Chevron’s venture into AI

Read More »

Constellation Energy looks to demand response to accommodate load growth

BY THE NUMBERS: Constellation Energy Q3 2025 5.8 GW Power generation and battery storage Constellation is proposing to bring online in Maryland to meet rising demand for electricity. 1 GW The capacity the utility hopes to add to its demand response programs. “A full nuclear unit’s worth of output,” said Jim McHugh, the company’s chief commercial officer. 19 TWh Electric load served to the Mid-Atlantic in Q3 2025. 46,477 GWh Produced by Constellation’s nuclear fleet in the third quarter, compared with 45,510 GWh last year. ‘Strong pipeline’ for demand response products Constellation Energy is “seeing a lot of great capability to use backup generation and flex compute,” President and CEO Joe Dominguez said in the company’s third-quarter earnings call on Friday. He added, however, that “I don’t think we’re going to get to a point where we could flex on and off the full output of data centers.” Dominguez said the company is exploring using artificial intelligence to “attract some of our other customers to actually providing the relief or the slack on the system during the key hours,” then “use their own backup generation or curtail their own consumption of energy during peak hours.” A lot of Constellation’s customers have shown interest in demand response products, and the pipeline for that “looks really strong right now,” said Chief Commercial Officer Jim McHugh. “We’ve found this kind of unique opportunity,” McHugh said. “We’re trying to be innovative around the product structure itself … We started executing [deals], working towards 1,000 MW or so in between now and the next couple of capacity auctions.” McHugh noted that 1 GW “portends to look like a full nuclear unit’s worth of output in terms of demand response.” “I think we’re still in the early days of this,” he said. “I think the combination of

Read More »

Energy Department Awards Contracts to Begin Refilling the Strategic Petroleum Reserve

The U.S. Department of Energy announced contracts have been awarded for the acquisition of one million barrels of crude oil for the Strategic Petroleum Reserve (SPR).  WASHINGTON— The U.S. Department of Energy (DOE) today announced that contracts have been awarded for the acquisition of approximately one million barrels of crude oil for the Strategic Petroleum Reserve (SPR). The contracts awarded on November 12, 2025, are for deliveries beginning in December 2025 through January 2026 to the Bryan Mound site. This announcement follows the Request for Proposal (RFP) that was announced on October 21, 2025.  President Trump promised to refill the SPR and rebuild America’s strategic strength. Currently, the SPR holds just over 400 million barrels out of its capacity of approximately 700 million barrels. The SPR was severely weakened by the previous administration’s reckless 180-million-barrel drawdown in 2022, which incurred nearly $280 million in costs, delayed critical infrastructure maintenance and put unprecedented wear and tear on storage and injection facilities.   “President Trump promised to protect America’s energy security by refilling and managing the Strategic Petroleum Reserve more responsibly,” said U.S. Secretary of Energy Chris Wright. “Awarding these contracts marks another step in the important process of refilling this national security asset. While this process won’t be complete overnight, these actions are an important step in strengthening our energy security and reversing the costly and irresponsible energy policies of the last administration.”  In response to the RFP, DOE received eighteen offers from six companies and awarded contracts to the most competitive bids that met all quality and specification requirements. Crude oil deliveries to the Bryan Mound SPR site are scheduled from December 1, 2025 through January 31, 2026.  For more information on the SPR please visit Infographic: Strategic Petroleum Reserve and Fact Sheet: Strategic Petroleum Reserve. ###

Read More »

When the Cloud Leaves Earth: Google and NVIDIA Test Space Data Centers for the Orbital AI Era

On November 4, 2025, Google unveiled Project Suncatcher, a moonshot research initiative exploring the feasibility of AI data centers in space. The concept envisions constellations of solar-powered satellites in Low Earth Orbit (LEO), each equipped with Tensor Processing Units (TPUs) and interconnected via free-space optical laser links. Google’s stated objective is to launch prototype satellites by early 2027 to test the idea and evaluate scaling paths if the technology proves viable. Rather than a commitment to move production AI workloads off-planet, Suncatcher represents a time-bound research program designed to validate whether solar-powered, laser-linked LEO constellations can augment terrestrial AI factories, particularly for power-intensive, latency-tolerant tasks. The 2025–2027 window effectively serves as a go/no-go phase to assess key technical hurdles including thermal management, radiation resilience, launch economics, and optical-link reliability. If these milestones are met, Suncatcher could signal the emergence of a new cloud tier: one that scales AI with solar energy rather than substations. Inside Google’s Suncatcher Vision Google has released a detailed technical paper titled “Towards a Future Space-Based, Highly Scalable AI Infrastructure Design.” The accompanying Google Research blog describes Project Suncatcher as “a moonshot exploring a new frontier” – an early-stage effort to test whether AI compute clusters in orbit can become a viable complement to terrestrial data centers. The paper outlines several foundational design concepts: Orbit and Power Project Suncatcher targets Low Earth Orbit (LEO), where solar irradiance is significantly higher and can remain continuous in specific orbital paths. Google emphasizes that space-based solar generation will serve as the primary power source for the TPU-equipped satellites. Compute and Interconnect Each satellite would host Tensor Processing Unit (TPU) accelerators, forming a constellation connected through free-space optical inter-satellite links (ISLs). Together, these would function as a disaggregated orbital AI cluster, capable of executing large-scale batch and training workloads. Downlink

Read More »

Cloud-based GPU savings are real – for the nimble

The pattern points to an evolving GPU ecosystem: while top-tier chips like Nvidia’s new GB200 Blackwell processors remain in extremely short supply, older models such as the A100 and H100 are becoming cheaper and more available. Yet, customer behavior may not match practical needs. “Many are buying the newest GPUs because of FOMO—the fear of missing out,” he added. “ChatGPT itself was built on older architecture, and no one complained about its performance.” Gil emphasized that managing cloud GPU resources now requires agility, both operationally and geographically. Spot capacity fluctuates hourly or even by the minute, and availability varies across data center regions. Enterprises willing to move workloads dynamically between regions—often with the help of AI-driven automation—can achieve cost reductions of up to 80%. “If you can move your workloads where the GPUs are cheap and available, you pay five times less than a company that can’t move,” he said. “Human operators can’t respond that fast automation is essential.” Conveniently, Cast sells an AI automation solution. But it is not the only one and the argument is valid. If spot pricing can be found cheaper at another location, you want to take it to keep the cloud bill down/ Gil concluded by urging engineers and CTOs to embrace flexibility and automation rather than lock themselves into fixed regions or infrastructure providers. “If you want to win this game, you have to let your systems self-adjust and find capacity where it exists. That’s how you make AI infrastructure sustainable.”

Read More »

Harnessing Gravity: RRPT Hydro Reimagines Data Center Power

At the 2025 Data Center Frontier Trends Summit, amid panels on AI, nuclear, and behind-the-meter power, few technologies stirred more curiosity than a modular hydropower system without dams or flowing rivers. That concept—piston-driven hydropower—was presented by Expanse Energy Corporation President and CEO Ed Nichols and Chief Electrical Engineer Gregory Tarver during the Trends Summit’s closing “6 Moonshots for the 2026 Data Center Frontier” panel. Nichols and Tarver joined the Data Center Frontier Show recently to discuss how their Reliable Renewable Power Technology (RRPT Hydro) platform could rewrite the economics of clean, resilient power for the AI era. A New Kind of Hydropower Patented in the U.S. and entering commercial readiness, RRPT Hydro’s system replaces flowing water with a gravity-and-buoyancy engine housed in vertical cylinders. Multiple pistons alternately sink and rise inside these cylinders—heavy on the downward stroke, buoyant on the upward—creating continuous motion that drives electrical generation. “It’s not perpetual motion,” Nichols emphasizes. “You need a starter source—diesel, grid, solar, anything—but once in motion, the system sustains itself, converting gravity’s constant pull and buoyancy’s natural lift into renewable energy.” The concept traces its roots to a moment of natural awe. Its inventor, a gas-processing engineer, was moved to action by the 2004 Boxing Day tsunami, seeking a way to “containerize” and safely harvest the vast energy seen in that disaster. Two decades later, that spark has evolved into a patented, scalable system designed for industrial deployment. Physics-Based Power: Gravity Down, Buoyancy Up Each RRPT module operates as a closed-loop hydropower system: On the downstroke, pistons filled with water become dense and fall under gravity, generating kinetic energy. On the upstroke, air ballast tanks lighten the pistons, allowing buoyant forces to restore potential energy. By combining gravitational and buoyant forces—both constant, free, and renewable—RRPT converts natural equilibrium into sustained mechanical power.

Read More »

Buyer’s guide to AI networking technology

Extreme Networks: AI management over AI hardware Extreme deliberately prioritizes AI-powered network management over building specialized hyperscale AI infrastructure, a pragmatic positioning for a vendor targeting enterprise and mid-market.Named a Leader in IDC MarketScape: Worldwide Enterprise Wireless LAN 2025 (October 2025) for AI-powered automation, flexible deployment options and expertise in high-density environments. The company specializes in challenging wireless environments including stadiums, airports and historic venues (Fenway Park, Lambeau Field, Dubai World Trade Center, Liverpool FC’s Anfield Stadium). Key AI networking hardware 8730 Switch: 32×400GbE QSFP-DD fixed configuration delivering 12.8 Tbps throughput in 2RU for IP fabric spine/leaf designs. Designed for AI and HPC workloads with low latency, robust traffic management and power efficiency. Runs Extreme ONE OS (microservices architecture). Supports integrated application hosting with dedicated CPU for VM-based apps. Available Q3 2025. 7830 Switch: High-density 100G/400G fixed-modular core switch delivering 32×100Gb QSFP28 + 8×400Gb QSFP-DD ports with two VIM expansion slots. VIM modules enable up to 64×100Gb or 24×400Gb total capacity with 12.8 Tbps throughput in 2RU. Powered by Fabric Engine OS. Announced May 2025, available Q3 2025. Wi-Fi 7 access points: AP4020 (indoor) and AP4060 (outdoor with external antenna support, GA September 2025) completing premium Wi-Fi 7 portfolio. Extreme Platform ONE:Generally available Q3 2025 with 265+ customers. Integrates conversational, multimodal and agentic AI with three agents (AI Expert, AI Canvas, Service AI Agent) cutting resolution times 98%. Includes embedded Universal ZTNA and two-tier simplified licensing. ExtremeCloud IQ: Cloud-based network management integrating wireless, wired and SD-WAN with AI/ML capabilities and digital twin support for testing configurations before deployment. Extreme Fabric: Native SPB-based Layer 2 fabric with sub-second convergence, automated macro and micro-segmentation and free licensing (no controllers required). Multi-area fabric architecture solves traditional SPB scaling limitations. Analyst Rankings: Market leadership in AI networking Foundry Each of the vendors has its

Read More »

Microsoft’s In-Chip Microfluidics Technology Resets the Limits of AI Cooling

Raising the Thermal Ceiling for AI Hardware As Microsoft positions it, the significance of in-chip microfluidics goes well beyond a novel way to cool silicon. By removing heat at its point of generation, the technology raises the thermal ceiling that constrains today’s most power-dense compute devices. That shift could redefine how next-generation accelerators are designed, packaged, and deployed across hyperscale environments. Impact of this cooling change: Higher-TDP accelerators and tighter packing. Where thermal density has been the limiting factor, in-chip microfluidics could enable denser server sleds—such as NVL- or NVL-like trays—or allow higher per-GPU power budgets without throttling. 3D-stacked and HBM-heavy silicon. Microsoft’s documentation explicitly ties microfluidic cooling to future 3D-stacked and high-bandwidth-memory (HBM) architectures, which would otherwise be heat-limited. By extracting heat inside the package, the approach could unlock new levels of performance and packaging density for advanced AI accelerators. Implications for the AI Data Center If microfluidics can be scaled from prototype to production, its influence will ripple through every layer of the data center, from the silicon package to the white space and plant. The technology touches not only chip design but also rack architecture, thermal planning, and long-term cost models for AI infrastructure. Rack densities, white space topology, and facility thermals Raising thermal efficiency at the chip level has a cascading effect on system design: GPU TDP trajectory. Press materials and analysis around Microsoft’s collaboration with Corintis suggest the feasibility of far higher thermal design power (TDP) envelopes than today’s roughly 1–2 kW per device. Corintis executives have publicly referenced dissipation targets in the 4 kW to 10 kW range, highlighting how in-chip cooling could sustain next-generation GPU power levels without throttling. Rack, ring, and row design. By removing much of the heat directly within the package, microfluidics could reduce secondary heat spread into boards and

Read More »

Designing the AI Century: 7×24 Exchange Fall ’25 Charts the New Data Center Industrial Stack

SMRs and the AI Power Gap: Steve Fairfax Separates Promise from Physics If NVIDIA’s Sean Young made the case for AI factories, Steve Fairfax offered a sobering counterweight: even the smartest factories can’t run without power—and not just any power, but constant, high-availability, clean generation at a scale utilities are increasingly struggling to deliver. In his keynote “Small Modular Reactors for Data Centers,” Fairfax, president of Oresme and one of the data center industry’s most seasoned voices on reliability, walked through the long arc from nuclear fusion research to today’s resurgent interest in fission at modular scale. His presentation blended nuclear engineering history with pragmatic counsel for AI-era infrastructure leaders: SMRs are promising, but their road to reality is paved with physics, fuel, and policy—not PowerPoint. From Fusion Research to Data Center Reliability Fairfax began with his own story—a career that bridges nuclear reliability and data center engineering. As a young physicist and electrical engineer at MIT, he helped build the Alcator C-MOD fusion reactor, a 400-megawatt research facility that heated plasma to 100 million degrees with 3 million amps of current. The magnet system alone drew 265,000 amps at 1,400 volts, producing forces measured in millions of pounds. It was an extreme experiment in controlled power, and one that shaped his later philosophy: design for failure, test for truth, and assume nothing lasts forever. When the U.S. cooled on fusion power in the 1990s, Fairfax applied nuclear reliability methods to data center systems—quantifying uptime and redundancy with the same math used for reactor safety. By 1994, he was consulting for hyperscale pioneers still calling 10 MW “monstrous.” Today’s 400 MW campuses, he noted, are beginning to look a lot more like reactors in their energy intensity—and increasingly, in their regulatory scrutiny. Defining the Small Modular Reactor Fairfax defined SMRs

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »