Stay Ahead, Stay ONMINE

Training Large Language Models: From TRPO to GRPO

Deepseek has recently made quite a buzz in the AI community, thanks to its impressive performance at relatively low costs. I think this is a perfect opportunity to dive deeper into how Large Language Models (LLMs) are trained. In this article, we will focus on the Reinforcement Learning (RL) side of things: we will cover […]

Deepseek has recently made quite a buzz in the AI community, thanks to its impressive performance at relatively low costs. I think this is a perfect opportunity to dive deeper into how Large Language Models (LLMs) are trained. In this article, we will focus on the Reinforcement Learning (RL) side of things: we will cover TRPO, PPO, and, more recently, GRPO (don’t worry, I will explain all these terms soon!) 

I have aimed to keep this article relatively easy to read and accessible, by minimizing the math, so you won’t need a deep Reinforcement Learning background to follow along. However, I will assume that you have some familiarity with Machine Learning, Deep Learning, and a basic understanding of how LLMs work.

I hope you enjoy the article!

The 3 steps of LLM training

The 3 steps of LLM training [1]

Before diving into RL specifics, let’s briefly recap the three main stages of training a Large Language Model:

  • Pre-training: the model is trained on a massive dataset to predict the next token in a sequence based on preceding tokens.
  • Supervised Fine-Tuning (SFT): the model is then fine-tuned on more targeted data and aligned with specific instructions.
  • Reinforcement Learning (often called RLHF for Reinforcement Learning with Human Feedback): this is the focus of this article. The main goal is to further refine responses’ alignments with human preferences, by allowing the model to learn directly from feedback.

Reinforcement Learning Basics

A robot trying to exit a maze! [2]

Before diving deeper, let’s briefly revisit the core ideas behind Reinforcement Learning.

RL is quite straightforward to understand at a high level: an agent interacts with an environment. The agent resides in a specific state within the environment and can take actions to transition to other states. Each action yields a reward from the environment: this is how the environment provides feedback that guides the agent’s future actions. 

Consider the following example: a robot (the agent) navigates (and tries to exit) a maze (the environment).

  • The state is the current situation of the environment (the robot’s position in the maze).
  • The robot can take different actions: for example, it can move forward, turn left, or turn right.
  • Successfully navigating towards the exit yields a positive reward, while hitting a wall or getting stuck in the maze results in negative rewards.

Easy! Now, let’s now make an analogy to how RL is used in the context of LLMs.

RL in the context of LLMs

Simplified RLHF Process [3]

When used during LLM training, RL is defined by the following components:

  • The LLM itself is the agent
  • Environment: everything external to the LLM, including user prompts, feedback systems, and other contextual information. This is basically the framework the LLM is interacting with during training.
  • Actions: these are responses to a query from the model. More specifically: these are the tokens that the LLM decides to generate in response to a query.
  • State: the current query being answered along with tokens the LLM has generated so far (i.e., the partial responses).
  • Rewards: this is a bit more tricky here: unlike the maze example above, there is usually no binary reward. In the context of LLMs, rewards usually come from a separate reward model, which outputs a score for each (query, response) pair. This model is trained from human-annotated data (hence “RLHF”) where annotators rank different responses. The goal is for higher-quality responses to receive higher rewards.

Note: in some cases, rewards can actually get simpler. For example, in DeepSeekMath, rule-based approaches can be used because math responses tend to be more deterministic (correct or wrong answer)

Policy is the final concept we need for now. In RL terms, a policy is simply the strategy for deciding which action to take. In the case of an LLM, the policy outputs a probability distribution over possible tokens at each step: in short, this is what the model uses to sample the next token to generate. Concretely, the policy is determined by the model’s parameters (weights). During RL training, we adjust these parameters so the LLM becomes more likely to produce “better” tokens— that is, tokens that produce higher reward scores.

We often write the policy as:

where a is the action (a token to generate), s the state (the query and tokens generated so far), and θ (model’s parameters).

This idea of finding the best policy is the whole point of RL! Since we don’t have labeled data (like we do in supervised learning) we use rewards to adjust our policy to take better actions. (In LLM terms: we adjust the parameters of our LLM to generate better tokens.)

TRPO (Trust Region Policy Optimization)

An analogy with supervised learning

Let’s take a quick step back to how supervised learning typically works. you have labeled data and use a loss function (like cross-entropy) to measure how close your model’s predictions are to the true labels.

We can then use algorithms like backpropagation and gradient descent to minimize our loss function and update the weights θ of our model.

Recall that our policy also outputs probabilities! In that sense, it is analogous to the model’s predictions in supervised learning… We are tempted to write something like:

where s is the current state and a is a possible action.

A(s, a) is called the advantage function and measures how good is the chosen action in the current state, compared to a baseline. This is very much like the notion of labels in supervised learning but derived from rewards instead of explicit labeling. To simplify, we can write the advantage as:

In practice, the baseline is calculated using a value function. This is a common term in RL that I will explain later. What you need to know for now is that it measures the expected reward we would receive if we continue following the current policy from the state s.

What is TRPO?

TRPO (Trust Region Policy Optimization) builds on this idea of using the advantage function but adds a critical ingredient for stability: it constrains how far the new policy can deviate from the old policy at each update step (similar to what we do with batch gradient descent for example).

  • It introduces a KL divergence term (see it as a measure of similarity) between the current and the old policy:
  • It also divides the policy by the old policy. This ratio, multiplied by the advantage function, gives us a sense of how beneficial each update is relative to the old policy.

Putting it all together, TRPO tries to maximize a surrogate objective (which involves the advantage and the policy ratio) subject to a KL divergence constraint.

PPO (Proximal Policy Optimization)

While TRPO was a significant advancement, it’s no longer used widely in practice, especially for training LLMs, due to its computationally intensive gradient calculations.

Instead, PPO is now the preferred approach in most LLMs architecture, including ChatGPT, Gemini, and more.

It is actually quite similar to TRPO, but instead of enforcing a hard constraint on the KL divergence, PPO introduces a “clipped surrogate objective” that implicitly restricts policy updates, and greatly simplifies the optimization process.

Here is a breakdown of the PPO objective function we maximize to tweak our model’s parameters.

Image by the Author

GRPO (Group Relative Policy Optimization)

How is the value function usually obtained?

Let’s first talk more about the advantage and the value functions I introduced earlier.

In typical setups (like PPO), a value model is trained alongside the policy. Its goal is to predict the value of each action we take (each token generated by the model), using the rewards we obtain (remember that the value should represent the expected cumulative reward).

Here is how it works in practice. Take the query “What is 2+2?” as an example. Our model outputs “2+2 is 4” and receives a reward of 0.8 for that response. We then go backward and attribute discounted rewards to each prefix:

  • “2+2 is 4” gets a value of 0.8
  • “2+2 is” (1 token backward) gets a value of 0.8γ
  • “2+2” (2 tokens backward) gets a value of 0.8γ²
  • etc.

where γ is the discount factor (0.9 for example). We then use these prefixes and associated values to train the value model.

Important note: the value model and the reward model are two different things. The reward model is trained before the RL process and uses pairs of (query, response) and human ranking. The value model is trained concurrently to the policy, and aims at predicting the future expected reward at each step of the generation process.

What’s new in GRPO

Even if in practice, the reward model is often derived from the policy (training only the “head”), we still end up maintaining many models and handling multiple training procedures (policy, reward, value model). GRPO streamlines this by introducing a more efficient method.

Remember what I said earlier?

In PPO, we decided to use our value function as the baseline. GRPO chooses something else: Here is what GRPO does: concretely, for each query, GRPO generates a group of responses (group of size G) and uses their rewards to calculate each response’s advantage as a z-score:

where rᵢ is the reward of the i-th response and μ and σ are the mean and standard deviation of rewards in that group.

This naturally eliminates the need for a separate value model. This idea makes a lot of sense when you think about it! It aligns with the value function we introduced before and also measures, in a sense, an “expected” reward we can obtain. Also, this new method is well adapted to our problem because LLMs can easily generate multiple non-deterministic outputs by using a low temperature (controls the randomness of tokens generation).

This is the main idea behind GRPO: getting rid of the value model.

Finally, GRPO adds a KL divergence term (to be exact, GRPO uses a simple approximation of the KL divergence to improve the algorithm further) directly into its objective, comparing the current policy to a reference policy (often the post-SFT model).

See the final formulation below:

Image by the Author

And… that’s mostly it for GRPO! I hope this gives you a clear overview of the process: it still relies on the same foundational ideas as TRPO and PPO but introduces additional improvements to make training more efficient, faster, and cheaper — key factors behind DeepSeek’s success.

Conclusion

Reinforcement Learning has become a cornerstone for training today’s Large Language Models, particularly through PPO, and more recently GRPO. Each method rests on the same RL fundamentals — states, actions, rewards, and policies — but adds its own twist to balance stability, efficiency, and human alignment:

TRPO introduced strict policy constraints via KL divergence

PPO eased those constraints with a clipped objective

GRPO took an extra step by removing the value model requirement and using group-based reward normalization. Of course, DeepSeek also benefits from other innovations, like high-quality data and other training strategies, but that is for another time!

I hope this article gave you a clearer picture of how these methods connect and evolve. I believe that Reinforcement Learning will become the main focus in training LLMs to improve their performance, surpassing pre-training and SFT in driving future innovations. 

If you’re interested in diving deeper, feel free to check out the references below or explore my previous posts.

Thanks for reading, and feel free to leave a clap and a comment!


Want to learn more about Transformers or dive into the math behind the Curse of Dimensionality? Check out my previous articles:

Transformers: How Do They Transform Your Data?
Diving into the Transformers architecture and what makes them unbeatable at language taskstowardsdatascience.com

The Math Behind “The Curse of Dimensionality”
Dive into the “Curse of Dimensionality” concept and understand the math behind all the surprising phenomena that arise…towardsdatascience.com



References:

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Nvidia claims near 50% boost in AI storage speed

Storage is an overlooked element of AI that has been overshadowed by all the emphasis on processors, namely GPUs. Large language models (LLMs) measure in the terabytes of size and all that needs to be moved around to be processed. So the faster you can move data, the better, so

Read More »

Kyndryl expands Palo Alto deal to offer managed SASE service

Kyndryl has expanded its alliance with Palo Alto Networks to add secure access service edge (SASE) services to its managed services offerings. In 2023, when Kyndryl first said it would integrate Palo Alto’s security products and services into its own managed security services, the vendors said they would ultimately support

Read More »

AI-Powered Policing: The Future of Traffic Safety in Kazakhstan

Traffic management is a growing challenge for cities worldwide, requiring a balance between enforcement, efficiency, and public trust. In Kazakhstan, the Qorgau system is redefining road safety through an innovative fusion of artificial intelligence (AI), computer vision, and mobile technology. Designed to assist traffic police in real-time violation detection and

Read More »

Quantum networking advances on Earth and in space

“Currently, the U.S. government is not investing in such testbeds or demonstrations, ensuring it will be a follower and not a leader in the development of technical advances in the field,” said a report released last year by the Quantum Economic Development Consortium, a global association of more than 250

Read More »

ADNOC Gas Posts Record $5 Billion Annual Profit Driven by Domestic Demand

ADNOC Gas PLC achieved its highest-ever yearly net income in 2024 at $5 billion, driven by natural gas consumption in the United Arab Emirates. Net earnings for the fourth quarter of 2024 totaled $1.38 billion, ADNOC Gas’ highest quarterly result since its public listing in 2023, the company reported on its website. “The company’s strong performance was underpinned by robust demand for domestic gas which supported volume growth and improved pricing”, said ADNOC Gas, the integrated gas processing arm of Abu Dhabi National Oil Co. (ADNOC). Annual sales volumes grew two percent to 3,616 million MMBtu. ADNOC Gas supplies about 60 percent of the UAE’s sales gas needs, as well as supplies over twenty countries, according to the company. Adjusted revenue for 2024 rose seven percent year-on-year to $24.43 billion. “The company’s strong top-line performance for 2024 translated into a strong EBITDA growth of 14 percent to $8.65 billion with a high, stable margin of 35 percent”, ADNOC Gas said. For the fourth quarter, adjusted revenue was $6.06 billion and EBITDA $2.28 billion. “The robust improvement was driven by several factors including a richer mix of gas, producing more liquids, and improved commercial terms in the domestic market”, ADNOC Gas said. Year-end free cash flow was $4.58 billion, with the October-December period contributing $1.22 billion. ADNOC Gas declared a dividend of $3.41 billion for 2024, half of which was paid September 2024. It expects to distribute the remaining half this April. “The final dividend for FY 2024 is in line with the company’s robust policy to increase the annual dividend by 5 percent annually and reflects the company’s strong free cash flow, which exceeds the dividend commitment by over $1 billion”, it said. ADNOC Gas chief executive Fatema Al Nuaimi commented, “Our record-breaking fourth quarter results demonstrate our ability to

Read More »

Ørsted steered steady course through troubled waters in 2024

Ørsted (CPH: ORSTED) saw its revenues increase in 2024 compared to 2023, in line with expectations, though the company was still weighed down by impairments on its US projects. The company’s board of directors approved the annual report for 2024, which saw operating profit for 2024 hit 32 billion Danish kroner (£3.5b) compared to 18.7b kroner (£2b) in 2023. A total of 7.3b kroner (£814m) of this came from the company renegotiating and settling contracts related to the close-down of its US offshore wind development Ocean Wind with a better than assumed outcome. Furthermore, earnings from the company’s offshore sites amounted to 23.8b kroner (£2.6b), which was an increase of 3.6b kroner (£401m) compared to 2023. The increase was driven by the ramp-up of generation at the company’s offshore wind farms Greater Changhua 1 and 2a in Taiwan, South Fork in the US, and Gode Wind 3 in Germany, along with higher wind speeds and higher prices on its inflation-indexed CfDs and green certificates. However, impairments continued to dog the Danish wind developer in 2024, coming in at 15.6b kroner (£1.7b), with the majority (14.1b kroner, or £1.5b) relating to the company’s US projects. The US impairments were driven by an increase in the US long-dated interest rate, a lower market-informed valuation of its US seabeds, construction delays, and higher expected costs for Revolution Wind and Sunrise Wind projects in the US. Ørsted forecast that its 2025 EBITDA excluding new partnership agreements and cancellation fees will be in the 25-28b kroner (£2.7-3.1b) range, and gross investments are expected to be 50-54b kroner (£5.5-6b). Ørsted saw a change in CEO last week, with former head Mads Nipper leaving his position in favour of then deputy CEO and chief commercial officer Rasmus Errboe. Since the announcement of significant impairment on its US

Read More »

UK forecast to miss 2035 emissions-reduction targets

A report by DNV suggests gas consumption will fall to 12% by the end of the decade, more than double NESO’s recommendations. The UK is expected to miss its emissions-reduction targets under the recently committed Nationally Determined Contribution of an 81% reduction by 2035 compared to 1990 levels, according to a report by energy advisory DNV. Decarbonisation efforts are anticipated to lead to a reduction in greenhouse gas emissions of 68% by 2035, and 82% by 2050, against 1990 levels – but to fall short of net zero. Renewable energy capacity, including solar, onshore wind and offshore wind, is expected to double in the next six years to 90 GW by 2030. However, according to DNV estimates, this remains 45 GW short of government targets to double onshore wind, triple solar and quadruple offshore wind. The UK will have to reduce emissions by 62% from 405 million tonnes of carbon dioxide equivalent per year (MtCO2eq/yr) to 155 MtCO2eq/yr by 2035 in order to achieve national clean power targets. However, DNV’s medium-term forecast suggests that the UK will be able to reduce emissions by only 35% by 2035. Hari Vamadevan, the firm’s executive vice president and regional director in the UK and Ireland for energy systems, said: “Despite economic and geopolitical challenges, the UK’s trajectory remains positive. “A substantial green prize for our economy – cleaner and more affordable energy, is there for the taking if we can grasp it. We must act swiftly to ensure we make decisive moves along the correct path.” The proportion of fossil fuels in the energy supply mix is expected to fall from 75% today to 34% by 2050, with low-carbon sources predicted to “surpass fossil fuels in the supply mix”. But according to the advisory, oil and gas are expected to remain “dominant” sources

Read More »

Moldova Inks Agreement for EU Help toward Independence from Russian Energy

The European Commission has signed a deal with Moldova to help the country decouple from Russian energy supply and fully integrate into the European Union energy market. The agreement includes a EUR 250 million ($240.78 million) support package from the EU for 2025. Part of the funding has already been disbursed. Part of the funding meant for the Transnistrian region, or the Left Bank, is subject to the fulfilment of EU demands for fundamental freedoms and human rights, according to an online statement by the Commission. In the Right Bank, the agreement, called the Comprehensive Strategy for Energy Independence and Resilience of Moldova, will compensate excess electricity costs for households for up to 110 kilowatt hours every month until December 2025. The two-year strategy will also include a hardship fund to lighten the energy bills of the “most exposed households” and compensate for the entire increase in power costs for social institutions, including schools and hospitals. The plan has also allotted EUR 15 million for agro-food and manufacturing businesses. “Furthermore, through the mobilization of international financial institutions, additional funding of EUR 50 million will be available for sustainable investments in energy efficiency projects by local public authorities, households and SMEs [small and medium enterprises]”, the Commission said. In the Left Bank, EUR 60 million is conditionally available for over 350,000 people affected by the discontinuation of supply by Russia’s Gazprom PJSC, the Commission said. “This support is subject to steps being taken on fundamental freedoms and human rights in the Transnistrian region and excludes energy-intensive activities”, it said. Currently Moldova’s energy system is delivering power and heat without blackouts, according to the Commission. “In the longer term, the EU support will allow Moldova to improve its energy security through investments and reforms for the energy transition and ensure the full phase-out of Russian

Read More »

UK government urged to ‘make a decision’ quickly on zonal pricing

Delays to a decision on zonal electricity pricing are an “albatross around the neck” of UK renewable energy developers, a parliamentary committee has heard. Energy UK director of policy and advocacy Adam Berman said the review of electricity market arrangements (REMA) process has “gone on endlessly now”. Berman said industry developers want to see a quick decision on locational pricing reforms as well as more information on “crucial elements” of how the scheme may be grandfathered in for existing projects. He told the Energy Security and Net Zero committee hearing that “no one knows which way the government’s going to go” and called for clarity as soon as possible. Launched in 2022 under the previous Conservative government, the REMA process is exploring reform options which could split the UK into regional electricity markets. In December, the Department for Energy Security and Net Zero (DESNZ) said no decision has been taken between zonal pricing or “reformed national pricing”. However, DESNZ confirmed the current status quo is “not an option”. The issue of zonal pricing has divided the sector amid concerns over the potential impact on renewable energy investment, particularly in Scotland. AR7 and zonal pricing concerns Berman told Westminster MPs that unless a decision is made on zonal pricing soon, it could impact bids in the seventh renewable auction round (AR7) later this year. The UK government is aiming to achieve the “biggest and most successful” renewables auction round ever in 2025, but Berman warned developers could simply walk away without clarity on zonal pricing. “The government is not considering implementing any of this until after 2030, but nonetheless we have a lot of investment we need [up] until 2030,” he said. © Supplied by Ocean WindsInstallation of the first turbine at the Moray West offshore wind farm in Scotland. “I

Read More »

From waste to worth: the CO2 opportunity

Carbon dioxide has long been the villain of the world’s energy outlook. But what if it is more complex? What if industry’s most notorious waste product could be something more – could actually have value? From sustainable aviation fuel (SAF) to construction materials, to foodstuffs and more – companies are rethinking CO2 as a feedstock for the circular economy. Carbon capture and utilisation (CCU) is not new, but could play a crucial role in delivering net zero and challenges the idea that emissions must simply be buried. Size of the prize To have any chance at achieving net zero by 2050, the world will need to tackle its CO2 problem. The largest share of this is likely to be via carbon capture and storage (CCS), rather than CCU. The International Energy Agency puts current capacity at around 50 million tonnes per year (tpy). It projects this will reach 435mn tpy by 2030. Sasol carbon manager Kevin Dale raises some questions around how much CO2 an economy could use. “It has an appeal in these early stages because you can potentially produce a product with value,” he says. “For instance, we could make calcium carbonate grains, powders that can be used as fillers in products, paints or whatever. But the scale is going to be a problem.” He also cites other opportunities such as e-fuels or battery electrolytes. “But we need to be talking about putting away multiple gigatonnes via CCS. To put away very, very large quantities, what the geological storage offers is scale.” But that is not to say there is no case for utilisation. For one thing, it puts a value on a waste product. But it also helps start to tackle the world’s CO2 problem, rather than just waiting for a perfect solution. Thinking of carbon from

Read More »

Linux containers in 2025 and beyond

The upcoming years will also bring about an increase in the use of standard container practices, such as the Open Container Initiative (OCI) standard, container registries, signing, testing, and GitOps workflows used for application development to build Linux systems. We’re also likely see a significant rise in the use of bootable containers, which are self-contained images that can boot directly into an operating system or application environment. Cloud platforms are often the primary platform for AI experimentation and container development because of their scalability and flexibility along the integration of both AI and ML services. They’re giving birth to many significant changes in the way we process data. With data centers worldwide, cloud platforms also ensure low-latency access and regional compliance for AI applications. As we move ahead, development teams will be able to collaborate more easily through shared development environments and efficient data storage.

Read More »

Let’s Go Build Some Data Centers: PowerHouse Drives Hyperscale and AI Infrastructure Across North America

PowerHouse Data Centers, a leading developer and builder of next-generation hyperscale data centers and a division of American Real Estate Partners (AREP), is making significant strides in expanding its footprint across North America, initiating several key projects and partnerships as 2025 begins.  The new developments underscore the company’s commitment to advancing digital infrastructure to meet the growing demands of hyperscale and AI-driven applications. Let’s take a closer look at some of PowerHouse Data Centers’ most recent announcements. Quantum Connect: Bridging the AI Infrastructure Gap in Ashburn On January 17, PowerHouse Data Centers announced a collaboration with Quantum Connect to develop Ashburn’s first fiber hub specifically designed for AI and high-density workloads. This facility is set to provide 20 MW of critical power, with initial availability slated for late 2026.  Strategically located in Northern Virginia’s Data Center Alley, Quantum Connect aims to offer scalable, high-density colocation solutions, featuring rack densities of up to 30kW to support modern workloads such as AI inference, edge caching, and regional compute integration. Quantum Connect said it currently has 1-3 MW private suites available for businesses seeking high-performance infrastructure that bridges the gap between retail colocation and hyperscale facilities. “Quantum Connect redefines what Ashburn’s data center market can deliver for businesses caught in the middle—those too large for retail colocation yet underserved by hyperscale environments,” said Matt Monaco, Senior Vice President at PowerHouse Data Centers. “We’re providing high-performance solutions for tenants with demanding needs but without hyperscale budgets.” Anchored by 130 miles of private conduit and 2,500 fiber pathways, Quantum Connect’s infrastructure offers tenants direct, short-hop connections to adjacent facilities and carrier networks.  With 14 campus entrances and secure, concrete-encased duct banks, the partners said the new facility minimizes downtime risks and reduces operational costs by eliminating the need for new optics or extended fiber runs.

Read More »

Blue Owl Swoops In As Major Backer of New, High-Profile, Sustainable U.S. Data Center Construction

With the global demand for data centers continuing to surge ahead, fueled by the proliferation of artificial intelligence (AI), cloud computing, and digital services, it is unsurprising that we are seeing aggressive investment strategies, beyond those of the existing hyperscalers. One of the dynamic players in this market is Blue Owl Capital, a leading asset management firm that has made significant strides in the data center sector. Back in October 2024 we reported on its acquisition of IPI Partners, a digital infrastructure fund manager, for approximately $1 billion. This acquisition added over $11 billion to the assets Blue Owl manages and focused specifically on digital infrastructure initiatives. This acquisition was completed as of January 5, 2025 and IPI’s Managing Partner, Matt A’Hearn has been appointed Head of Blue Owl’s digital infrastructure strategy. A Key Player In Digital Infrastructure and Data Centers With multi-billion-dollar joint ventures and financing initiatives, Blue Owl is positioning itself as a key player in the digital infrastructure space. The company investments in data centers, the implications of its strategic moves, and the broader impact on the AI and digital economy highlights the importance of investment in the data center to the economy overall. With the rapid growth of the data center industry, it is unsurprising that aggressive investment fund management is seeing it as an opportunity. Analysts continue to emphasize that the global data center market is expected to grow at a compound annual growth rate (CAGR) of 10.2% from 2023 to 2030, reaching $517.17 billion by the end of the decade. In this rapidly evolving landscape, Blue Owl Capital has emerged as a significant contributor. The firm’s investments in data centers are not just about capitalizing on current trends but also about shaping the future of digital infrastructure. Spreading the Wealth In August 2024, Blue Owl

Read More »

Global Data Center Operator Telehouse Launches Liquid Cooling Lab in the UK to Meet Ongoing AI and HPC Demand

@import url(‘/fonts/fira_sans.css’); a { color: #0074c7; } .ebm-page__main h1, .ebm-page__main h2, .ebm-page__main h3, .ebm-page__main h4, .ebm-page__main h5, .ebm-page__main h6 { font-family: “Fira Sans”, Arial, sans-serif; } body { letter-spacing: 0.025em; font-family: “Fira Sans”, Arial, sans-serif; } button, .ebm-button-wrapper { font-family: “Fira Sans”, Arial, sans-serif; } .label-style { text-transform: uppercase; color: var(–color-grey); font-weight: 600; font-size: 0.75rem; } .caption-style { font-size: 0.75rem; opacity: .6; } #onetrust-pc-sdk [id*=btn-handler], #onetrust-pc-sdk [class*=btn-handler] { background-color: #005ea0 !important; border-color: #005ea0 !important; } #onetrust-policy a, #onetrust-pc-sdk a, #ot-pc-content a { color: #005ea0 !important; } #onetrust-consent-sdk #onetrust-pc-sdk .ot-active-menu { border-color: #005ea0 !important; } #onetrust-consent-sdk #onetrust-accept-btn-handler, #onetrust-banner-sdk #onetrust-reject-all-handler, #onetrust-consent-sdk #onetrust-pc-btn-handler.cookie-setting-link { background-color: #005ea0 !important; border-color: #005ea0 !important; } #onetrust-consent-sdk .onetrust-pc-btn-handler { color: #005ea0 !important; border-color: #005ea0 !important; background-color: undefined !important; } Starting in early 2025, Telehouse International Corporation of Europe will offer an advanced liquid cooling lab at their newest data center, Telehouse South at the London Docklands campus in Blackwall Yard. Telehouse has partnered with four leading liquid-cooling technology vendors — Accelsius, JetCool, Legrand, and EkkoSense — to allow customers to explore different cooling technologies and management tools while evaluating suitability for their use in the customer applications. Dr. Stu Redshaw, Chief Technology and Innovation Officer at EkkoSense, said about the project: Given that it’s not possible to run completely liquid-cooled data centers, the reality for most data center operators is that liquid cooling and air cooling will have an important role to play in the cooling mix – most likely as part of an evolving hybrid cooling approach. However, key engineering questions need answering before simply deploying liquid cooling – including establishing the exact blend of air and liquid cooling technologies you’ll need. And also recognizing the complexity of managing the operation of a hybrid air cooling and liquid cooling approach within the same room. This increases the

Read More »

Flexential Partners with Lonestar to Support First Lunar Data Center

Flexential, a leading provider of secure and flexible data center solutions, this month announced that it has joined forces with Lonestar Data Holdings Inc. to support the upcoming launch of Freedom, Lonestar’s second lunar data center. Scheduled to launch aboard a SpaceX Falcon 9 rocket via Intuitive Machines, this mission is a critical step toward establishing a permanent data center on the Moon. Ground-Based Support for Lunar Data Storage Flexential’s Tampa data center will serve as the mission control platform for Lonestar’s lunar operations, providing colocation, interconnection, and professional services. The facility was chosen for its proximity to Florida’s Space Coast launch operations and its ability to deliver low-latency connectivity for critical functions. Flexential operates two data centers in Tampa and four in Florida as part of its FlexAnywhere® Platform, comprising more than 40 facilities across the U.S. “Flexential’s partnership with Lonestar represents our commitment to advancing data center capabilities beyond conventional boundaries,” said Jason Carolan, Chief Innovation Officer at Flexential. “By supporting Lonestar’s space-based data center initiative, we are helping to create new possibilities for data storage and disaster recovery. This project demonstrates how innovative data center expertise can help organizations prepare for a resilient future with off-world storage solutions.” A New Era of Space-Based Resiliency The growing demand for data center capacity, with U.S. power consumption expected to double from 17 GW in 2022 to 35 GW by 2030 (according to McKinsey & Company), is driving interest in space-based solutions. Storing data off-planet reduces reliance on terrestrial resources while enhancing security against natural disasters, warfare, and cyber threats. The Freedom data center will provide resiliency, disaster recovery, and edge processing services for government and enterprise customers requiring the highest levels of data protection. The solar-powered data center leverages Solid-State Drives (SSDs) and a Field Programmable Gate Array (FPGA) edge

Read More »

Why DeepSeek Is Great for AI and HPC and Maybe No Big Deal for Data Centers

In the rapid and ever-evolving landscape of artificial intelligence (AI) and high-performance computing (HPC), the emergence of DeepSeek’s R1 model has sent ripples across industries. DeepSeek has been the data center industry’s topic of the week, for sure. The Chinese AI app surged to the top of US app store leaderboards last weekend, sparking a global selloff in technology shares Monday morning.  But while some analysts predict a transformative impact within the industry, a closer examination suggests that, for data centers at large, the furor over DeepSeek might ultimately be much ado about nothing. DeepSeek’s Breakthrough in AI and HPC DeepSeek, a Chinese AI startup, this month unveiled its R1 model, claiming performance on par with, or even surpassing, leading models like OpenAI’s ChatGPT-4 and Anthropic’s Claude-3.5-Sonnet. Remarkably, DeepSeek developed this model at a fraction of the cost typically associated with such advancements, utilizing a cluster of 256 server nodes equipped with 2,048 GPUs. This efficiency has been attributed to innovative techniques and optimized resource utilization. AI researchers have been abuzz about the performance of the DeepSeek chatbot that produces results similar to ChatGPT, but is based on open-source models and reportedly trained on older GPU chips. Some researchers are skeptical of claims about DeepSeek’s development costs and means, but its performance appears to challenge common assumptions about the computing cost of developing AI applications. This efficiency has been attributed to innovative techniques and optimized resource utilization.  Market Reactions and Data Center Implications The announcement of DeepSeek’s R1 model led to significant market reactions, with notable declines in tech stocks, including a substantial drop in Nvidia’s valuation. This downturn was driven by concerns that more efficient AI models could reduce the demand for high-end hardware and, by extension, the expansive data centers that house them. For now, investors are re-assessing the

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »