LLaDA: The Diffusion Model That Could Redefine Language Generation

Stay Ahead, Stay ONMINE

LLaDA: The Diffusion Model That Could Redefine Language Generation

Introduction What if we could make language models think more like humans? Instead of writing one word at a time, what if they could sketch out their thoughts first, and gradually refine them? This is exactly what Large Language Diffusion Models (LLaDA) introduces: a different approach to current text generation used in Large Language Models (LLMs). Unlike traditional autoregressive models (ARMs), which predict text sequentially, left to right, LLaDA leverages a diffusion-like process to generate text. Instead of generating tokens sequentially, it progressively refines masked text until it forms a coherent response. In this article, we will dive into how LLaDA works, why it matters, and how it could shape the next generation of LLMs. I hope you enjoy the article! The current state of LLMs To appreciate the innovation that LLaDA represents, we first need to understand how current large language models (LLMs) operate. Modern LLMs follow a two-step training process that has become an industry standard: Pre-training: The model learns general language patterns and knowledge by predicting the next token in massive text datasets through self-supervised learning. Supervised Fine-Tuning (SFT): The model is refined on carefully curated data to improve its ability to follow instructions and generate useful outputs. Note that current LLMs often use RLHF as well to further refine the weights of the model, but this is not used by LLaDA so we will skip this step here. These models, primarily based on the Transformer architecture, generate text one token at a time using next-token prediction. Simplified Transformer architecture for text generation (Image by the author) Here is a simplified illustration of how data passes through such a model. Each token is embedded into a vector and is transformed through successive transformer layers. In current LLMs (LLaMA, ChatGPT, DeepSeek, etc), a classification head is used only on the last token embedding to predict the next token in the sequence. This works thanks to the concept of masked self-attention: each token attends to all the tokens that come before it. We will see later how LLaDA can get rid of the mask in its attention layers. Attention process: input embeddings are multiplied byQuery, Key, and Value matrices to generate new embeddings (Image by the author, inspired by [3]) If you want to learn more about Transformers, check out my article here. While this approach has led to impressive results, it also comes with significant limitations, some of which have motivated the development of LLaDA. Current limitations of LLMs Current LLMs face several critical challenges: Computational Inefficiency Imagine having to write a novel where you can only think about one word at a time, and for each word, you need to reread everything you’ve written so far. This is essentially how current LLMs operate — they predict one token at a time, requiring a complete processing of the previous sequence for each new token. Even with optimization techniques like KV caching, this process is quite computationally expensive and time-consuming. Limited Bidirectional Reasoning Traditional autoregressive models (ARMs) are like writers who could never look ahead or revise what they’ve written so far. They can only predict future tokens based on past ones, which limits their ability to reason about relationships between different parts of the text. As humans, we often have a general idea of what we want to say before writing it down, current LLMs lack this capability in some sense. Amount of data Existing models require enormous amounts of training data to achieve good performance, making them resource-intensive to develop and potentially limiting their applicability in specialized domains with limited data availability. What is LLaDA LLaDA introduces a fundamentally different approach to Language Generation by replacing traditional autoregression with a “diffusion-based” process (we will dive later into why this is called “diffusion”). Let’s understand how this works, step by step, starting with pre-training. LLaDA pre-training Remember that we don’t need any “labeled” data during the pre-training phase. The objective is to feed a very large amount of raw text data into the model. For each text sequence, we do the following: We fix a maximum length (similar to ARMs). Typically, this could be 4096 tokens. 1% of the time, the lengths of sequences are randomly sampled between 1 and 4096 and padded so that the model is also exposed to shorter sequences. We randomly choose a “masking rate”. For example, one could pick 40%. We mask each token with a probability of 0.4. What does “masking” mean exactly? Well, we simply replace the token with a special token: . As with any other token, this token is associated with a particular index and embedding vector that the model can process and interpret during training. We then feed our entire sequence into our transformer-based model. This process transforms all the input embedding vectors into new embeddings. We apply the classification head to each of the masked tokens to get a prediction for each. Mathematically, our loss function averages cross-entropy losses over all the masked tokens in the sequence, as below: Loss function used for LLaDA (Image by the author) 5. And… we repeat this procedure for billions or trillions of text sequences. Note, that unlike ARMs, LLaDA can fully utilize bidirectional dependencies in the text: it doesn’t require masking in attention layers anymore. However, this can come at an increased computational cost. Hopefully, you can see how the training phase itself (the flow of the data into the model) is very similar to any other LLMs. We simply predict randomly masked tokens instead of predicting what comes next. LLaDA SFT For auto-regressive models, SFT is very similar to pre-training, except that we have pairs of (prompt, response) and want to generate the response when giving the prompt as input. This is exactly the same concept for LlaDa! Mimicking the pre-training process: we simply pass the prompt and the response, mask random tokens from the response only, and feed the full sequence into the model, which will predict missing tokens from the response. The innovation in inference Innovation is where LLaDA gets more interesting, and truly utilizes the “diffusion” paradigm. Until now, we always randomly masked some text as input and asked the model to predict these tokens. But during inference, we only have access to the prompt and we need to generate the entire response. You might think (and it’s not wrong), that the model has seen examples where the masking rate was very high (potentially 1) during SFT, and it had to learn, somehow, how to generate a full response from a prompt. However, generating the full response at once during inference will likely produce very poor results because the model lacks information. Instead, we need a method to progressively refine predictions, and that’s where the key idea of ‘remasking’ comes in. Here is how it works, at each step of the text generation process: Feed the current input to the model (this is the prompt, followed by tokens) The model generates one embedding for each input token. We get predictions for the tokens only. And here is the important step: we remask a portion of them. In particular: we only keep the “best” tokens i.e. the ones with the best predictions, with the highest confidence. We can use this partially unmasked sequence as input in the next generation step and repeat until all tokens are unmasked. You can see that, interestingly, we have much more control over the generation process compared to ARMs: we could choose to remask 0 tokens (only one generation step), or we could decide to keep only the best token every time (as many steps as tokens in the response). Obviously, there is a trade-off here between the quality of the predictions and inference time. Let’s illustrate that with a simple example (in that case, I choose to keep the best 2 tokens at every step) LLaDA generation process example (Image by the author) Note, in practice, the remasking step would work as follows. Instead of remasking a fixed number of tokens, we would remask a proportion of s/t tokens over time, from t=1 down to 0, where s is in [0, t]. In particular, this means we remask fewer and fewer tokens as the number of generation steps increases. Example: if we want N sampling steps (so N discrete steps from t=1 down to t=1/N with steps of 1/N), taking s = (t-1/N) is a good choice, and ensures that s=0 at the end of the process. The image below summarizes the 3 steps described above. “Mask predictor” simply denotes the Llm (LLaDA), predicting masked tokens. Pre-training (a.), SFT (b.) and inference (c.) using LLaDA. (source: [1]) Can autoregression and diffusion be combined? Another clever idea developed in LLaDA is to combine diffusion with traditional autoregressive generation to use the best of both worlds! This is called semi-autoregressive diffusion. Divide the generation process into blocks (for instance, 32 tokens in each block). The objective is to generate one block at a time (like we would generate one token at a time in ARMs). For each block, we apply the diffusion logic by progressively unmasking tokens to reveal the entire block. Then move on to predicting the next block. Semi-autoregressive process (source: [1]) This is a hybrid approach: we probably lose some of the “backward” generation and parallelization capabilities of the model, but we better “guide” the model towards the final output. I think this is a very interesting idea because it depends a lot on a hyperparameter (the number of blocks), that can be tuned. I imagine different tasks might benefit more from the backward generation process, while others might benefit more from the more “guided” generation from left to right (more on that in the last paragraph). Why “Diffusion”? I think it’s important to briefly explain where this term actually comes from. It reflects a similarity with image diffusion models (like Dall-E), which have been very popular for image generation tasks. In image diffusion, a model first adds noise to an image until it’s unrecognizable, then learns to reconstruct it step by step. LLaDA applies this idea to text by masking tokens instead of adding noise, and then progressively unmasking them to generate coherent language. In the context of image generation, the masking step is often called “noise scheduling”, and the reverse (remasking) is the “denoising” step. How do Diffusion Models work? (source: [2]) You can also see LLaDA as some type of discrete (non-continuous) diffusion model: we don’t add noise to tokens, but we “deactivate” some tokens by masking them, and the model learns how to unmask a portion of them. Results Let’s go through a few of the interesting results of LLaDA. You can find all the results in the paper. I chose to focus on what I find the most interesting here. Training efficiency: LLaDA shows similar performance to ARMs with the same number of parameters, but uses much fewer tokens during training (and no RLHF)! For example, the 8B version uses around 2.3T tokens, compared to 15T for LLaMa3. Using different block and answer lengths for different tasks: for example, the block length is particularly large for the Math dataset, and the model demonstrates strong performance for this domain. This could suggest that mathematical reasoning may benefit more from the diffusion-based and backward process. Source: [1] Interestingly, LLaDA does better on the “Reversal poem completion task”. This task requires the model to complete a poem in reverse order, starting from the last lines and working backward. As expected, ARMs struggle due to their strict left-to-right generation process. Source: [1] LLaDA is not just an experimental alternative to ARMs: it shows real advantages in efficiency, structured reasoning, and bidirectional text generation. Conclusion I think LLaDA is a promising approach to language generation. Its ability to generate multiple tokens in parallel while maintaining global coherence could definitely lead to more efficient training, better reasoning, and improved context understanding with fewer computational resources. Beyond efficiency, I think LLaDA also brings a lot of flexibility. By adjusting parameters like the number of blocks generated, and the number of generation steps, it can better adapt to different tasks and constraints, making it a versatile tool for various language modeling needs, and allowing more human control. Diffusion models could also play an important role in pro-active AI and agentic systems by being able to reason more holistically. As research into diffusion-based language models advances, LLaDA could become a useful step toward more natural and efficient language models. While it’s still early, I believe this shift from sequential to parallel generation is an interesting direction for AI development. Thanks for reading! Check out my previous articles: References: [1] Liu, C., Wu, J., Xu, Y., Zhang, Y., Zhu, X., & Song, D. (2024). Large Language Diffusion Models. arXiv preprint arXiv:2502.09992. https://arxiv.org/pdf/2502.09992 [2] Yang, Ling, et al. “Diffusion models: A comprehensive survey of methods and applications.” ACM Computing Surveys 56.4 (2023): 1–39. [3] Alammar, J. (2018, June 27). The Illustrated Transformer. Jay Alammar’s Blog. https://jalammar.github.io/illustrated-transformer/

Introduction

What if we could make language models think more like humans? Instead of writing one word at a time, what if they could sketch out their thoughts first, and gradually refine them?

This is exactly what Large Language Diffusion Models (LLaDA) introduces: a different approach to current text generation used in Large Language Models (LLMs). Unlike traditional autoregressive models (ARMs), which predict text sequentially, left to right, LLaDA leverages a diffusion-like process to generate text. Instead of generating tokens sequentially, it progressively refines masked text until it forms a coherent response.

In this article, we will dive into how LLaDA works, why it matters, and how it could shape the next generation of LLMs.

I hope you enjoy the article!

The current state of LLMs

To appreciate the innovation that LLaDA represents, we first need to understand how current large language models (LLMs) operate. Modern LLMs follow a two-step training process that has become an industry standard:

Pre-training: The model learns general language patterns and knowledge by predicting the next token in massive text datasets through self-supervised learning.
Supervised Fine-Tuning (SFT): The model is refined on carefully curated data to improve its ability to follow instructions and generate useful outputs.

Note that current LLMs often use RLHF as well to further refine the weights of the model, but this is not used by LLaDA so we will skip this step here.

These models, primarily based on the Transformer architecture, generate text one token at a time using next-token prediction.

Simplified Transformer architecture for text generation (Image by the author)

Here is a simplified illustration of how data passes through such a model. Each token is embedded into a vector and is transformed through successive transformer layers. In current LLMs (LLaMA, ChatGPT, DeepSeek, etc), a classification head is used only on the last token embedding to predict the next token in the sequence.

This works thanks to the concept of masked self-attention: each token attends to all the tokens that come before it. We will see later how LLaDA can get rid of the mask in its attention layers.

Attention process: input embeddings are multiplied byQuery, Key, and Value matrices to generate new embeddings (Image by the author, inspired by [3])

If you want to learn more about Transformers, check out my article here.

While this approach has led to impressive results, it also comes with significant limitations, some of which have motivated the development of LLaDA.

Current limitations of LLMs

Current LLMs face several critical challenges:

Computational Inefficiency

Imagine having to write a novel where you can only think about one word at a time, and for each word, you need to reread everything you’ve written so far. This is essentially how current LLMs operate — they predict one token at a time, requiring a complete processing of the previous sequence for each new token. Even with optimization techniques like KV caching, this process is quite computationally expensive and time-consuming.

Limited Bidirectional Reasoning

Traditional autoregressive models (ARMs) are like writers who could never look ahead or revise what they’ve written so far. They can only predict future tokens based on past ones, which limits their ability to reason about relationships between different parts of the text. As humans, we often have a general idea of what we want to say before writing it down, current LLMs lack this capability in some sense.

Amount of data

Existing models require enormous amounts of training data to achieve good performance, making them resource-intensive to develop and potentially limiting their applicability in specialized domains with limited data availability.

What is LLaDA

LLaDA introduces a fundamentally different approach to Language Generation by replacing traditional autoregression with a “diffusion-based” process (we will dive later into why this is called “diffusion”).

Let’s understand how this works, step by step, starting with pre-training.

LLaDA pre-training

Remember that we don’t need any “labeled” data during the pre-training phase. The objective is to feed a very large amount of raw text data into the model. For each text sequence, we do the following:

We fix a maximum length (similar to ARMs). Typically, this could be 4096 tokens. 1% of the time, the lengths of sequences are randomly sampled between 1 and 4096 and padded so that the model is also exposed to shorter sequences.
We randomly choose a “masking rate”. For example, one could pick 40%.
We mask each token with a probability of 0.4. What does “masking” mean exactly? Well, we simply replace the token with a special token: . As with any other token, this token is associated with a particular index and embedding vector that the model can process and interpret during training.
We then feed our entire sequence into our transformer-based model. This process transforms all the input embedding vectors into new embeddings. We apply the classification head to each of the masked tokens to get a prediction for each. Mathematically, our loss function averages cross-entropy losses over all the masked tokens in the sequence, as below:

Loss function used for LLaDA (Image by the author)

5. And… we repeat this procedure for billions or trillions of text sequences.

Note, that unlike ARMs, LLaDA can fully utilize bidirectional dependencies in the text: it doesn’t require masking in attention layers anymore. However, this can come at an increased computational cost.

Hopefully, you can see how the training phase itself (the flow of the data into the model) is very similar to any other LLMs. We simply predict randomly masked tokens instead of predicting what comes next.

LLaDA SFT

For auto-regressive models, SFT is very similar to pre-training, except that we have pairs of (prompt, response) and want to generate the response when giving the prompt as input.

This is exactly the same concept for LlaDa! Mimicking the pre-training process: we simply pass the prompt and the response, mask random tokens from the response only, and feed the full sequence into the model, which will predict missing tokens from the response.

The innovation in inference

Innovation is where LLaDA gets more interesting, and truly utilizes the “diffusion” paradigm.

Until now, we always randomly masked some text as input and asked the model to predict these tokens. But during inference, we only have access to the prompt and we need to generate the entire response. You might think (and it’s not wrong), that the model has seen examples where the masking rate was very high (potentially 1) during SFT, and it had to learn, somehow, how to generate a full response from a prompt.

However, generating the full response at once during inference will likely produce very poor results because the model lacks information. Instead, we need a method to progressively refine predictions, and that’s where the key idea of ‘remasking’ comes in.

Here is how it works, at each step of the text generation process:

Feed the current input to the model (this is the prompt, followed by tokens)
The model generates one embedding for each input token. We get predictions for the tokens only. And here is the important step: we remask a portion of them. In particular: we only keep the “best” tokens i.e. the ones with the best predictions, with the highest confidence.
We can use this partially unmasked sequence as input in the next generation step and repeat until all tokens are unmasked.

You can see that, interestingly, we have much more control over the generation process compared to ARMs: we could choose to remask 0 tokens (only one generation step), or we could decide to keep only the best token every time (as many steps as tokens in the response). Obviously, there is a trade-off here between the quality of the predictions and inference time.

Let’s illustrate that with a simple example (in that case, I choose to keep the best 2 tokens at every step)

LLaDA generation process example (Image by the author)

Note, in practice, the remasking step would work as follows. Instead of remasking a fixed number of tokens, we would remask a proportion of s/t tokens over time, from t=1 down to 0, where s is in [0, t]. In particular, this means we remask fewer and fewer tokens as the number of generation steps increases.

Example: if we want N sampling steps (so N discrete steps from t=1 down to t=1/N with steps of 1/N), taking s = (t-1/N) is a good choice, and ensures that s=0 at the end of the process.

The image below summarizes the 3 steps described above. “Mask predictor” simply denotes the Llm (LLaDA), predicting masked tokens.

Pre-training (a.), SFT (b.) and inference (c.) using LLaDA. (source: [1])

Can autoregression and diffusion be combined?

Another clever idea developed in LLaDA is to combine diffusion with traditional autoregressive generation to use the best of both worlds! This is called semi-autoregressive diffusion.

Divide the generation process into blocks (for instance, 32 tokens in each block).
The objective is to generate one block at a time (like we would generate one token at a time in ARMs).
For each block, we apply the diffusion logic by progressively unmasking tokens to reveal the entire block. Then move on to predicting the next block.

Semi-autoregressive process (source: [1])

This is a hybrid approach: we probably lose some of the “backward” generation and parallelization capabilities of the model, but we better “guide” the model towards the final output.

I think this is a very interesting idea because it depends a lot on a hyperparameter (the number of blocks), that can be tuned. I imagine different tasks might benefit more from the backward generation process, while others might benefit more from the more “guided” generation from left to right (more on that in the last paragraph).

Why “Diffusion”?

I think it’s important to briefly explain where this term actually comes from. It reflects a similarity with image diffusion models (like Dall-E), which have been very popular for image generation tasks.

In image diffusion, a model first adds noise to an image until it’s unrecognizable, then learns to reconstruct it step by step. LLaDA applies this idea to text by masking tokens instead of adding noise, and then progressively unmasking them to generate coherent language. In the context of image generation, the masking step is often called “noise scheduling”, and the reverse (remasking) is the “denoising” step.

How do Diffusion Models work? (source: [2])

You can also see LLaDA as some type of discrete (non-continuous) diffusion model: we don’t add noise to tokens, but we “deactivate” some tokens by masking them, and the model learns how to unmask a portion of them.

Results

Let’s go through a few of the interesting results of LLaDA.

You can find all the results in the paper. I chose to focus on what I find the most interesting here.

Training efficiency: LLaDA shows similar performance to ARMs with the same number of parameters, but uses much fewer tokens during training (and no RLHF)! For example, the 8B version uses around 2.3T tokens, compared to 15T for LLaMa3.
Using different block and answer lengths for different tasks: for example, the block length is particularly large for the Math dataset, and the model demonstrates strong performance for this domain. This could suggest that mathematical reasoning may benefit more from the diffusion-based and backward process.

Interestingly, LLaDA does better on the “Reversal poem completion task”. This task requires the model to complete a poem in reverse order, starting from the last lines and working backward. As expected, ARMs struggle due to their strict left-to-right generation process.

LLaDA is not just an experimental alternative to ARMs: it shows real advantages in efficiency, structured reasoning, and bidirectional text generation.

Conclusion

I think LLaDA is a promising approach to language generation. Its ability to generate multiple tokens in parallel while maintaining global coherence could definitely lead to more efficient training, better reasoning, and improved context understanding with fewer computational resources.

Beyond efficiency, I think LLaDA also brings a lot of flexibility. By adjusting parameters like the number of blocks generated, and the number of generation steps, it can better adapt to different tasks and constraints, making it a versatile tool for various language modeling needs, and allowing more human control. Diffusion models could also play an important role in pro-active AI and agentic systems by being able to reason more holistically.

As research into diffusion-based language models advances, LLaDA could become a useful step toward more natural and efficient language models. While it’s still early, I believe this shift from sequential to parallel generation is an interesting direction for AI development.

Thanks for reading!

Check out my previous articles:

References:

[1] Liu, C., Wu, J., Xu, Y., Zhang, Y., Zhu, X., & Song, D. (2024). Large Language Diffusion Models. arXiv preprint arXiv:2502.09992. https://arxiv.org/pdf/2502.09992
[2] Yang, Ling, et al. “Diffusion models: A comprehensive survey of methods and applications.” ACM Computing Surveys 56.4 (2023): 1–39.
[3] Alammar, J. (2018, June 27). The Illustrated Transformer. Jay Alammar’s Blog. https://jalammar.github.io/illustrated-transformer/

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Cisco fixes critical IMC auth bypass present in many products

Cisco has released patches for a critical vulnerability in its out-of-band management solution, present in many of its servers and appliances. The flaw allows unauthenticated remote attackers to gain admin access to the Cisco Integrated Management Controller (IMC), which gives administrators remote control over servers even when the main OS

Kyndryl service targets AI agent automation, security

Understand agents, serving as a single source of truth to help mitigate the risks associated with shadow AI. Validate each agent before launch by testing for security, resilience, and policy compliance to ensure they meet your standards before going live. Maintain control with real-time guardrails that keep agents operating within

Why can’t we have nice routers anymore?

In the Volt Typhoon and Flax Typhoon attacks, the routers themselves weren’t compromised because they were foreign-made routers. Far from it! They were compromised because they were unpatched, Internet-exposed, and end-of-life. The router manufacturers were no more guilty of opening the doors to these attacks than Microsoft is for your

New tool on AWS makes it easier to develop quantum error correction

Constellation is available via Quantum Elements and runs on AWS, says Izhar Medalsy, co-founder and CEO at Quantum Elements. And it is designed to help quantum researchers develop and test error correction strategies. Alternatives, such as the popular Stim simulator from Google Quantum AI, don’t simulate all the potential sources

Energy Department Authorizes Additional Exports of LNG from Elba Island Terminal, Strengthening Global Energy Supply with U.S. LNG

WASHINGTON—U.S. Secretary of Energy Chris Wright today authorized an immediate 22% increase in exports of liquefied natural gas (LNG) from the Elba Island Terminal in Chatham County, Georgia. With today’s order, Kinder Morgan subsidiary Southern LNG Company L.L.C., operator of the Elba Island LNG Terminal, is now authorized to export up to an additional 28.25 (Bcf/yr) to non-free trade agreement countries, strengthening global natural gas supplies with reliable U.S. LNG. Elba Island was previously authorized to export up to 130 billion cubic feet per year (Bcf/yr) of natural gas as LNG to non-free trade agreement countries and has been exporting U.S. LNG since 2019. The project is positioned to export the additional approved volumes immediately. “At a time when global energy supply routes face disruption, the United States remains a reliable energy partner to our allies and trading partners,” said DOE Assistant Secretary of the Hydrocarbons and Geothermal Energy Office, Kyle Haustveit. “DOE is using all available authorities to ensure American energy can reach global markets when it is needed most, supporting energy security and helping stabilize global energy supplies.” The action comes as global oil and LNG supply routes face disruption from tensions in the Middle East and attacks carried out by Iran and its proxies, threatening the reliable flow of energy through critical maritime corridors. The Department will continue to act, using its full set of authorities, to ensure U.S. LNG remains a dependable energy source in global energy markets and a stabilizing presence in times of disruption. Thanks to President Trump’s leadership and American innovation, the United States is the world’s largest natural gas producer and exporter, with exports reaching all-time highs in March 2026. Since President Trump ended the previous administration’s LNG export approval ban, the Department has approved more than 19 Bcf/d of LNG export authorizations. With recent final investment decisions for additional export capacity, U.S. LNG exports are set

Energy Department Initiates Additional Strategic Petroleum Reserve Emergency Exchange to Stabilize Global Oil Supply

WASHINGTON—The U.S. Department of Energy (DOE) issued a Request for Proposal (RFP) today for an emergency exchange of 10-million-barrels from the Strategic Petroleum Reserve (SPR). This action is part of the coordinated release of 400-million-barrels from IEA member nations’ strategic reserves President Trump previously announced. The United States continues to deliver on its 172-million-barrel release commitment. The crude oil will originate from the Strategic Petroleum Reserve’s (SPR) Bryan Mound site. Today’s action builds on the initial phase of the Emergency Exchange, which moved quickly to award 45.2 million barrels from the Bayou Choctaw, Bryan Mound, and West Hackberry SPR sites. The 10-million-barrel exchange leverages the full capabilities of the SPR, alongside the President’s limited Jones Act waiver, to accelerate critical near-term oil flows into the market. “Today’s action furthers the United States’ efforts to move oil quickly to the market and mitigate short-term supply disruptions,” said DOE Assistant Secretary of the Hydrocarbons and Geothermal Energy Office Kyle Haustveit. “Thanks to President Trump, America is managing our national security assets responsibly again. Through this exchange, we will continue to refill the Strategic Petroleum Reserve by bringing additional barrels back at a later date through this pragmatic exchange structure, strengthening its long-term readiness and all at no cost to the American taxpayer.” Under DOE’s exchange authority, participating companies will return the borrowed 10 million barrels with additional premium barrels by next year. This exchange delivers immediate crude to refiners and the market while generating additional barrels for the American people at no cost to taxpayers. Bids for the solicitation are due no later than 11:00 A.M. CT on Monday, April 6, 2026.   For more information on the SPR, please visit DOE’s website. 

Trump Administration Keeps Colorado Coal Plant Open to Ensure Affordable, Reliable and Secure Power in Colorado

WASHINGTON—U.S. Secretary of Energy Chris Wright today issued an emergency order to keep a Colorado coal plant operational to ensure Americans maintain access to affordable, reliable and secure electricity. The order directs Tri-State Generation and Transmission Association (Tri-State), Platte River Power Authority, Salt River Project, PacifiCorp, and Public Service Company of Colorado (Xcel Energy), in coordination with the Western Area Power Administration (WAPA) Rocky Mountain Region and Southwest Power Pool (SPP), to take all measures necessary to ensure that Unit 1 at the Craig Station in Craig, Colorado is available to operate. Unit One of the coal plant was scheduled to shut down at the end of 2025 but on December 30, 2025, Secretary Wright issued an emergency order directing Tri-State and the co-owners to ensure that Unit 1 at the Craig Station remains available to operate. “The last administration’s energy subtraction policies threatened America’s energy security and positioned our nation to likely experience significantly more blackouts in the coming years—thankfully, President Trump won’t let that happen,” said Energy Secretary Wright. “The Trump Administration will continue taking action to ensure we don’t lose critical generation sources. Americans deserve access to affordable, reliable, and secure energy to power their homes all the time, regardless of whether the wind is blowing or the sun is shining.” Thanks to President Trump’s leadership, coal plants across the country are reversing plans to shut down. In 2025, more than 17 gigawatts (GW) of coal-power electricity generation were saved. On April 1, once Tri-State and the WAPA Rocky Mountain Region join the SPP RTO West expansion, SPP is directed to take every step to employ economic dispatch to minimize costs to ratepayers. According to DOE’s Resource Adequacy Report, blackouts were on track to potentially increase 100 times by 2030 if the U.S. continued to take reliable

NextDecade contractor Bechtel awards ABB more Rio Grande LNG automation work

NextDecade Corp. contractor Bechtel Corp. has awarded ABB Ltd. additional integrated automation and electrical solution orders, extending its scope to Trains 4 and 5 of NextDecade’s 30-million tonne/year (tpy) Rio Grande LNG (RGLNG) plant in Brownsville, Tex. The orders were booked in third- and fourth-quarters 2025 and build on ABB’s Phase 1 work with Trains 1-3, totaling 17 million tpy. The scope for RGLNG Trains 4 and 5 includes deployment of an integrated control and safety system consisting of a distributed control system, emergency shutdown, and fire and gas systems. An electrical controls and monitoring system will provide unified visibility of the plant’s electrical infrastructure. These two overarching solutions will provide a common automation platform. ABB will also supply medium-voltage drives, synchronous motors, transformers, motor controllers and switchgear. The orders also include local equipment buildings—two for Train 4 and one for Train 5— housing critical control and electrical systems in prefabricated modules to streamline installation and commissioning on site. The solutions being delivered to Bechtel use ABB adaptive execution, a methodology for capital projects designed to optimize engineering work and reduce delivery timelines. Phase 1 of RGLNG is under construction and expected to begin operations in 2027. Operations at Train 4 are expected in 2030 and Train 5 in 2031. ABB’s senior vice-president for the Americas, Scott McCay, confirmed to Oil & Gas Journal at CERAWeek by S&P Global in Houston that the company is doing similar work through Tecnimont for Argent LNG’s planned 25-million tpy plant in Port Fourchon, La.; 10-million tpy Phase 1 and 15-million tpy Phase 2. Argent is targeting 2030 completion for its plant.

Persistent oil flow imbalances drive Enverus to increase crude price forecast

Citing impacts from the Iran war, near-zero flows through the Strait of Hormuz, accelerating global stock draws, and expectations for a muted US production response despite higher prices, Enverus Intelligence Research (EIR) raised its Brent crude oil price forecast. EIR now expects Brent to average $95/bbl for the remainder of 2026 and $100/bbl in 2027, reflecting what it described as a persistent global oil flow imbalance that continues to draw down inventories. “The world has an oil flow problem that is draining stocks,” said Al Salazar, director of research at EIR. “Whenever that oil flow problem is resolved, the world is left with low stocks. That’s what drives our oil price outlook higher for longer.” The outlook assumes the Strait of Hormuz remains largely closed for 3 months. EIR estimates that each month of constrained flows shifts the price outlook by about $10–15/bbl, underscoring the scale of the disruption and uncertainty around its duration. Despite West Texas Intermediate (WTI) prices of $90–100/bbl, EIR does not expect US producers to materially increase output. The firm forecasts US liquids production growth of 370,000 b/d by end-2026 and 580,000 b/d by end-2027, citing drilling-to-production lags, industry consolidation, and continued capital discipline. Global oil demand growth for 2026 has been reduced to about 500,000 b/d from 1.0 million b/d as higher energy prices and anticipated supply disruptions weigh on economic activity. Cumulative global oil stock draws are estimated at roughly 1 billion bbl through 2027, with non-OECD inventories—particularly in Asia—absorbing nearly half of the impact. A 60-day Jones Act waiver may provide limited short-term US shipping flexibility, but EIR said the measure is unlikely to materially affect global oil prices given broader market forces.

Equinor begins drilling $9-billion natural gas development project offshore Brazil

Equinor has started drilling the Raia natural gas project in the Campos basin presalt offshore Brazil. The $9-billion project is Equinor’s largest international investment, its largest project under execution, and marks the deepest water depth operation in its portfolio. The drilling campaign, which began Mar. 24 with the Valaris DS‑17 drillship, includes six wells in the Raia area 200 km offshore in water depths of around 2,900 m. The area is expected to hold recoverable natural gas and condensate reserves of over 1 billion boe. Raia’s development concept is based on production through wells connected to a 126,000-b/d floating production, storage and offloading unit (FPSO), which will treat produced oil/condensate and gas. Natural gas will be transported through a 200‑km pipeline from the FPSO to Cabiúnas, in the city of Macaé, Rio de Janeiro state. Once in operation, expected in 2028, the project will have the capacity to export up to 16 million cu m/day of natural gas, which could represent 15% of Brazil’s natural gas demand, the company said in a release Mar. 24. “While drilling takes place, integration and commissioning activities on the FPSO are progressing well putting us on track towards a safe start of operations in 2028,” said Geir Tungesvik, executive vice-president, projects, drilling and procurement, Equinor. The Raia project is operated by Equinor (35%), in partnership with Repsol Sinopec Brasil (35%) and Petrobras (30%).

Google Research touts memory-compression breakthrough for AI processing

The last time the market witnessed a shakeup like this was China’s DeepSeek, but doubts emerged quickly about its efficacy. Developers found DeepSeek’s efficiency gains required deep architectural decisions that had to be built in from the start. TurboQuant requires no retraining or fine-tuning. You just drop it straight into existing inference pipelines, at least in theory. If it works in production systems with no retrofitting, then data center operators will get tremendous performance gains on existing hardware. Data center operators won’t have to throw hardware at the performance problem. However, analysts urge caution before jumping to conclusions. “This is a research breakthrough, not a shipping product,” said Alex Cordovil, research director for physical infrastructure at The Dell’Oro Group. “There’s often a meaningful gap between a published paper and real-world inference workloads.” Also, Dell’Oro notes that efficiency gains in AI compute tend to get consumed by more demand, known as the Jevons paradox. “Any freed-up capacity would likely be absorbed by frontier models expanding their capabilities rather than reducing their hardware footprint.” Jim Handy, president of Objective Analysis, agrees on that second part. “Hyperscalers won’t cut their spending – they’ll just spend the same amount and get more bang for their buck,” he said. “Data centers aren’t looking to reach a certain performance level and subsequently stop spending on AI. They’re looking to out-spend each other to gain market dominance. This won’t change that.” Google plans to present a paper outlining TurboQuant at the ICLR conference in Rio de Janeiro running from April 23 through April 27.

Amazon Middle East datacenter suffers second drone hit as Iran steps up attacks

Amazon was contacted for comment on the latest Bahrain drone incident, but said it had nothing to add beyond the statement in its current advisory. Denial of infrastructure Doing the damage is the Shaheed 136, a small and unsophisticated drone designed to overwhelm defenders with numbers. If only one in twenty reaches its target, the price-performance still exceeds that of more expensive systems. When aimed at critical infrastructure such as datacenters, the effect is also psychological; the threat of an attack on its own can be enough to make it difficult for organizations to continue using an at-risk facility. Iran’s targeting of the Bahrain datacenter is unlikely to be random. Amazon opened its ME-SOUTH-1 AWS presence in 2019, and it is still believed to be the company’s largest site in the Middle East. Earlier this week, the Islamic Revolutionary Guard Corps (IRGC) Telegram channel explicitly threatened to target at least 18 US companies operating in the region, including Microsoft, Google, Nvidia, and Apple. This follows similar threats to an even longer list of US companies made on the IRGC-affiliated Tasnim News Agency in recent weeks. That strategy doesn’t bode well for US companies that have made large investments in Middle Eastern datacenter infrastructure in recent years, drawn by the growing wealth and influence of countries in the region. This includes Amazon, which has announced plans to build a $5.3 billion datacenter in Saudi Arabia, due to become available in 2026. If this is now under threat, whether by warfare or the hypothetical possibility of attack, that will create uncertainty.

Data Center Jobs: Engineering, Construction, Commissioning, Sales, Field Service and Facility Tech Jobs Available in Major Data Center Hotspots

Each month Data Center Frontier, in partnership with Pkaza, posts some of the hottest data center career opportunities in the market. Here’s a look at some of the latest data center jobs posted on the Data Center Frontier jobs board, powered by Pkaza Critical Facilities Recruiting. Looking for Data Center Candidates? Check out Pkaza’s Active Candidate / Featured Candidate Hotlist Power Applications Engineer Pittsburgh, PA This position is also available in: Denver, CO and Andrews, SC. Our client is a leading provider and manufacturer of industrial electrical power equipment used in industrial applications for mission critical operations. They help their customers save money by reducing energy and operating costs and provide solutions for modernizing their customer’s existing electrical infrastructure. This company provides cooling solutions to many of the world’s largest organizations and government facilities and enterprise clients, colocation providers and hyperscale companies. This career-growth minded opportunity offers exciting projects with leading-edge technology and innovation as well as competitive salaries and benefits. Electrical Commissioning Engineer Ashburn, VA This traveling position is also available in: New York, NY; White Plains, NY; Dallas, TX; Richmond, VA; Montvale, NJ; Charlotte, NC; Atlanta, GA; Hampton, GA; New Albany, OH; Cedar Rapids, IA; Phoenix, AZ; Salt Lake City, UT; Kansas City, MO; Omaha, NE; Chesterton, IN or Chicago, IL. *** ALSO looking for a LEAD EE and ME CxA Agents and CxA PMs. *** Our client is an engineering design and commissioning company that has a national footprint and specializes in MEP critical facilities design. They provide design, commissioning, consulting and management expertise in the critical facilities space. They have a mindset to provide reliability, energy efficiency, sustainable design and LEED expertise when providing these consulting services for enterprise, colocation and hyperscale companies. This career-growth minded opportunity offers exciting projects with leading-edge technology and innovation as well as competitive

No joke: data centers are warming the planet

The researchers also made use of a database provided by the International Energy Agency (IEA) that the authors pointed out contains more than 11,000 locations worldwide, of which 8,472 have been detected to dwell outside of highly dense urban areas. The latter locations were then used to “quantify the effect of data centers on the environment in terms of the LST gradient that could be measured on the areas surrounding each data center.” Asking the wrong question Asked if AI data centers are really causing local warming, or if this phenomenon is overstated, Sanchit Vir Gogia, chief analyst at Greyhound Research, said, “the signal is real, but the industry is asking the wrong question. The research shows a consistent rise in land surface temperature of around 2°C following the establishment of large data centre facilities.” The debate, however, “has quickly shifted to causality: whether this is driven by operational heat from compute, or by land transformation during construction. That distinction matters scientifically, but it does not change the strategic implication.” Land surface temperature, said Gogia, is not the same as air temperature, and that gap will be used to challenge the findings. “But dismissing the signal on that basis would be a mistake,” he noted. “Data centers concentrate energy use, replace natural surfaces with heat-retaining materials, and continuously reject heat into the environment. Those are known drivers of thermal change.” He added, “the uncomfortable truth is this: Even if the exact mechanism is debated, the outcome aligns with first principles. Infrastructure at this scale alters its surroundings. The industry does not yet have a clean way to separate construction impact from operational impact, and that ambiguity makes the risk harder to model, not easier. This is not overstated, it is under-interpreted.” Location strategy must change But will the findings change

Schneider Electric Maps the AI Data Center’s Next Design Era

The coming shift to higher-voltage DC That internal power challenge led Simonelli to one of the most consequential architectural topics in the interview: the likely transition toward higher-voltage DC distribution at very high rack densities. He framed it pragmatically. At current density levels, the industry knows how to get power into racks at 200 or 300 kilowatts. But as densities rise toward 400 kilowatts and beyond, conventional AC approaches start to run into physical limits. Too much cable, too much copper, too much conversion equipment, and too much space consumed by power infrastructure rather than GPUs. At that point, he said, higher-voltage DC becomes attractive not for philosophical reasons, but because it reduces current, shrinks conductor size, saves space, and leaves more room for revenue-generating compute. “It is again a paradigm shift,” Simonelli said of DC power at these densities. “But it won’t be everywhere.” That is probably right. The transition will not be universal, and the exact thresholds will evolve. But his underlying point is powerful. As rack densities climb, electrical architecture starts to matter not only for efficiency and reliability, but for physical space allocation inside the rack. Put differently, power distribution becomes a compute-enablement issue. Distance between accelerators matters, too. The closer GPUs and TPUs can be kept together, the better they perform. If power infrastructure can be compacted, more of the rack can be devoted to dense compute, improving the economics and performance of the system. That is a strong example of how AI is collapsing traditional boundaries between facility engineering and compute architecture. The two are no longer cleanly separable. Gas now, renewables over time On onsite power, Simonelli was refreshingly direct. If the goal is dispatchable onsite generation at the scale now being contemplated for AI facilities, he said, “there really isn’t an alternative

SoftBank’s 10 GW Ohio Campus Marks a Turning Point for AI Infrastructure

Renewables can reduce carbon intensity, but they cannot independently meet the need for continuous, multi-gigawatt firm capacity without large-scale storage and balancing resources. For developers targeting guaranteed availability within this decade, natural gas remains the most readily deployable option, despite the political and environmental tradeoffs it introduces. AEP and the Cost Allocation Model If the generation plan explains the engineering logic, the AEP structure speaks to the political one. At the center is one of the most contested questions in the data center market: who pays for the transmission and grid upgrades required to serve large new loads? Utilities, regulators, consumer advocates, and large-load customers are increasingly divided on this issue. Data center developers point to economic development benefits, including jobs and tax revenue. Consumer advocates counter that residential ratepayers should not subsidize infrastructure built primarily to serve hyperscale demand. The Ohio arrangement is being positioned as a response to that conflict. DOE states that SB Energy and AEP Ohio are partnering on $4.2 billion in new transmission infrastructure, with SB Energy committing to fund those investments rather than passing costs through to ratepayers. AEP has echoed that position, indicating the structure is intended to avoid upward pressure on transmission rates for Ohio customers. Whether that outcome holds will depend on regulatory review and execution. But the structure itself is significant. It frames a model in which large-load developers directly fund the transmission infrastructure required to support their projects, rather than relying on broader cost recovery mechanisms. That makes the project more than a construction milestone. It positions it as a potential policy template. If validated, this approach could influence how utilities and regulators across the U.S. address cost allocation for AI-scale infrastructure, particularly as similar disputes intensify in constrained grid regions. Why 765-kV Transmission Signals Scale AEP says the

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle