How do AI models generate videos?

Stay Ahead, Stay ONMINE

How do AI models generate videos?

Sure, the clips you see in demo reels are cherry-picked to showcase a company’s models at the top of their game. But with the technology in the hands of more users than ever before—Sora and Veo 3 are available in the ChatGPT and Gemini apps for paying subscribers—even the most casual filmmaker can now knock out something remarkable. The downside is that creators are competing with AI slop, and social media feeds are filling up with faked news footage. Video generation also uses up a huge amount of energy, many times more than text or image generation. With AI-generated videos everywhere, let’s take a moment to talk about the tech that makes them work. How do you generate a video? Let’s assume you’re a casual user. There are now a range of high-end tools that allow pro video makers to insert video generation models into their workflows. But most people will use this technology in an app or via a website. You know the drill: “Hey, Gemini, make me a video of a unicorn eating spaghetti. Now make its horn take off like a rocket.” What you get back will be hit or miss, and you’ll typically need to ask the model to take another pass or 10 before you get more or less what you wanted. [embedded content] So what’s going on under the hood? Why is it hit or miss—and why does it take so much energy? The latest wave of video generation models are what’s known as latent diffusion transformers. Yes, that’s quite a mouthful. Let’s unpack each part in turn, starting with diffusion. What’s a diffusion model? Imagine taking an image and adding a random spattering of pixels to it. Take that pixel-spattered image and spatter it again and then again. Do that enough times and you will have turned the initial image into a random mess of pixels, like static on an old TV set. A diffusion model is a neural network trained to reverse that process, turning random static into images. During training, it gets shown millions of images in various stages of pixelation. It learns how those images change each time new pixels are thrown at them and, thus, how to undo those changes. The upshot is that when you ask a diffusion model to generate an image, it will start off with a random mess of pixels and step by step turn that mess into an image that is more or less similar to images in its training set. [embedded content] But you don’t want any image—you want the image you specified, typically with a text prompt. And so the diffusion model is paired with a second model—such as a large language model (LLM) trained to match images with text descriptions—that guides each step of the cleanup process, pushing the diffusion model toward images that the large language model considers a good match to the prompt. An aside: This LLM isn’t pulling the links between text and images out of thin air. Most text-to-image and text-to-video models today are trained on large data sets that contain billions of pairings of text and images or text and video scraped from the internet (a practice many creators are very unhappy about). This means that what you get from such models is a distillation of the world as it’s represented online, distorted by prejudice (and pornography). It’s easiest to imagine diffusion models working with images. But the technique can be used with many kinds of data, including audio and video. To generate movie clips, a diffusion model must clean up sequences of images—the consecutive frames of a video—instead of just one image. What’s a latent diffusion model? All this takes a huge amount of compute (read: energy). That’s why most diffusion models used for video generation use a technique called latent diffusion. Instead of processing raw data—the millions of pixels in each video frame—the model works in what’s known as a latent space, in which the video frames (and text prompt) are compressed into a mathematical code that captures just the essential features of the data and throws out the rest. A similar thing happens whenever you stream a video over the internet: A video is sent from a server to your screen in a compressed format to make it get to you faster, and when it arrives, your computer or TV will convert it back into a watchable video. And so the final step is to decompress what the latent diffusion process has come up with. Once the compressed frames of random static have been turned into the compressed frames of a video that the LLM guide considers a good match for the user’s prompt, the compressed video gets converted into something you can watch. With latent diffusion, the diffusion process works more or less the way it would for an image. The difference is that the pixelated video frames are now mathematical encodings of those frames rather than the frames themselves. This makes latent diffusion far more efficient than a typical diffusion model. (Even so, video generation still uses more energy than image or text generation. There’s just an eye-popping amount of computation involved.) What’s a latent diffusion transformer? Still with me? There’s one more piece to the puzzle—and that’s how to make sure the diffusion process produces a sequence of frames that are consistent, maintaining objects and lighting and so on from one frame to the next. OpenAI did this with Sora by combining its diffusion model with another kind of model called a transformer. This has now become standard in generative video. Transformers are great at processing long sequences of data, like words. That has made them the special sauce inside large language models such as OpenAI’s GPT-5 and Google DeepMind’s Gemini, which can generate long sequences of words that make sense, maintaining consistency across many dozens of sentences. But videos are not made of words. Instead, videos get cut into chunks that can be treated as if they were. The approach that OpenAI came up with was to dice videos up across both space and time. “It’s like if you were to have a stack of all the video frames and you cut little cubes from it,” says Tim Brooks, a lead researcher on Sora. [embedded content] A selection of videos generated with Veo 3 and Midjourney. The clips have been enhanced in postproduction with Topaz, an AI video-editing tool. Credit: VaigueMan Using transformers alongside diffusion models brings several advantages. Because they are designed to process sequences of data, transformers also help the diffusion model maintain consistency across frames as it generates them. This makes it possible to produce videos in which objects don’t pop in and out of existence, for example. And because the videos are diced up, their size and orientation do not matter. This means that the latest wave of video generation models can be trained on a wide range of example videos, from short vertical clips shot with a phone to wide-screen cinematic films. The greater variety of training data has made video generation far better than it was just two years ago. It also means that video generation models can now be asked to produce videos in a variety of formats. What about the audio? A big advance with Veo 3 is that it generates video with audio, from lip-synched dialogue to sound effects to background noise. That’s a first for video generation models. As Google DeepMind CEO Demis Hassabis put it at this year’s Google I/O: “We’re emerging from the silent era of video generation.” [embedded content] The challenge was to find a way to line up video and audio data so that the diffusion process would work on both at the same time. Google DeepMind’s breakthrough was a new way to compress audio and video into a single piece of data inside the diffusion model. When Veo 3 generates a video, its diffusion model produces audio and video together in a lockstep process, ensuring that the sound and images are synched. You said that diffusion models can generate different kinds of data. Is this how LLMs work too? No—or at least not yet. Diffusion models are most often used to generate images, video, and audio. Large language models—which generate text (including computer code)—are built using transformers. But the lines are blurring. We’ve seen how transformers are now being combined with diffusion models to generate videos. And this summer Google DeepMind revealed that it was building an experimental large language model that used a diffusion model instead of a transformer to generate text. Here’s where things start to get confusing: Though video generation (which uses diffusion models) consumes a lot of energy, diffusion models themselves are in fact more efficient than transformers. Thus, by using a diffusion model instead of a transformer to generate text, Google DeepMind’s new LLM could be a lot more efficient than existing LLMs. Expect to see more from diffusion models in the near future!

The downside is that creators are competing with AI slop, and social media feeds are filling up with faked news footage. Video generation also uses up a huge amount of energy, many times more than text or image generation.

With AI-generated videos everywhere, let’s take a moment to talk about the tech that makes them work.

How do you generate a video?

Let’s assume you’re a casual user. There are now a range of high-end tools that allow pro video makers to insert video generation models into their workflows. But most people will use this technology in an app or via a website. You know the drill: “Hey, Gemini, make me a video of a unicorn eating spaghetti. Now make its horn take off like a rocket.” What you get back will be hit or miss, and you’ll typically need to ask the model to take another pass or 10 before you get more or less what you wanted.

So what’s going on under the hood? Why is it hit or miss—and why does it take so much energy? The latest wave of video generation models are what’s known as latent diffusion transformers. Yes, that’s quite a mouthful. Let’s unpack each part in turn, starting with diffusion.

What’s a diffusion model?

Imagine taking an image and adding a random spattering of pixels to it. Take that pixel-spattered image and spatter it again and then again. Do that enough times and you will have turned the initial image into a random mess of pixels, like static on an old TV set.

A diffusion model is a neural network trained to reverse that process, turning random static into images. During training, it gets shown millions of images in various stages of pixelation. It learns how those images change each time new pixels are thrown at them and, thus, how to undo those changes.

The upshot is that when you ask a diffusion model to generate an image, it will start off with a random mess of pixels and step by step turn that mess into an image that is more or less similar to images in its training set.

But you don’t want any image—you want the image you specified, typically with a text prompt. And so the diffusion model is paired with a second model—such as a large language model (LLM) trained to match images with text descriptions—that guides each step of the cleanup process, pushing the diffusion model toward images that the large language model considers a good match to the prompt.

An aside: This LLM isn’t pulling the links between text and images out of thin air. Most text-to-image and text-to-video models today are trained on large data sets that contain billions of pairings of text and images or text and video scraped from the internet (a practice many creators are very unhappy about). This means that what you get from such models is a distillation of the world as it’s represented online, distorted by prejudice (and pornography).

It’s easiest to imagine diffusion models working with images. But the technique can be used with many kinds of data, including audio and video. To generate movie clips, a diffusion model must clean up sequences of images—the consecutive frames of a video—instead of just one image.

What’s a latent diffusion model?

All this takes a huge amount of compute (read: energy). That’s why most diffusion models used for video generation use a technique called latent diffusion. Instead of processing raw data—the millions of pixels in each video frame—the model works in what’s known as a latent space, in which the video frames (and text prompt) are compressed into a mathematical code that captures just the essential features of the data and throws out the rest.

A similar thing happens whenever you stream a video over the internet: A video is sent from a server to your screen in a compressed format to make it get to you faster, and when it arrives, your computer or TV will convert it back into a watchable video.

And so the final step is to decompress what the latent diffusion process has come up with. Once the compressed frames of random static have been turned into the compressed frames of a video that the LLM guide considers a good match for the user’s prompt, the compressed video gets converted into something you can watch.

With latent diffusion, the diffusion process works more or less the way it would for an image. The difference is that the pixelated video frames are now mathematical encodings of those frames rather than the frames themselves. This makes latent diffusion far more efficient than a typical diffusion model. (Even so, video generation still uses more energy than image or text generation. There’s just an eye-popping amount of computation involved.)

What’s a latent diffusion transformer?

Still with me? There’s one more piece to the puzzle—and that’s how to make sure the diffusion process produces a sequence of frames that are consistent, maintaining objects and lighting and so on from one frame to the next. OpenAI did this with Sora by combining its diffusion model with another kind of model called a transformer. This has now become standard in generative video.

Transformers are great at processing long sequences of data, like words. That has made them the special sauce inside large language models such as OpenAI’s GPT-5 and Google DeepMind’s Gemini, which can generate long sequences of words that make sense, maintaining consistency across many dozens of sentences.

But videos are not made of words. Instead, videos get cut into chunks that can be treated as if they were. The approach that OpenAI came up with was to dice videos up across both space and time. “It’s like if you were to have a stack of all the video frames and you cut little cubes from it,” says Tim Brooks, a lead researcher on Sora.

A selection of videos generated with Veo 3 and Midjourney. The clips have been enhanced in postproduction with Topaz, an AI video-editing tool. Credit: VaigueMan

Using transformers alongside diffusion models brings several advantages. Because they are designed to process sequences of data, transformers also help the diffusion model maintain consistency across frames as it generates them. This makes it possible to produce videos in which objects don’t pop in and out of existence, for example.

And because the videos are diced up, their size and orientation do not matter. This means that the latest wave of video generation models can be trained on a wide range of example videos, from short vertical clips shot with a phone to wide-screen cinematic films. The greater variety of training data has made video generation far better than it was just two years ago. It also means that video generation models can now be asked to produce videos in a variety of formats.

What about the audio?

A big advance with Veo 3 is that it generates video with audio, from lip-synched dialogue to sound effects to background noise. That’s a first for video generation models. As Google DeepMind CEO Demis Hassabis put it at this year’s Google I/O: “We’re emerging from the silent era of video generation.”

The challenge was to find a way to line up video and audio data so that the diffusion process would work on both at the same time. Google DeepMind’s breakthrough was a new way to compress audio and video into a single piece of data inside the diffusion model. When Veo 3 generates a video, its diffusion model produces audio and video together in a lockstep process, ensuring that the sound and images are synched.

You said that diffusion models can generate different kinds of data. Is this how LLMs work too?

No—or at least not yet. Diffusion models are most often used to generate images, video, and audio. Large language models—which generate text (including computer code)—are built using transformers. But the lines are blurring. We’ve seen how transformers are now being combined with diffusion models to generate videos. And this summer Google DeepMind revealed that it was building an experimental large language model that used a diffusion model instead of a transformer to generate text.

Here’s where things start to get confusing: Though video generation (which uses diffusion models) consumes a lot of energy, diffusion models themselves are in fact more efficient than transformers. Thus, by using a diffusion model instead of a transformer to generate text, Google DeepMind’s new LLM could be a lot more efficient than existing LLMs. Expect to see more from diffusion models in the near future!

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

AWS Fastnet cable will expand cloud reach, but CIOs must read the fine print

Cloud providers dive deep Over the last few years, the explosive growth in data and AI workloads has pushed hyperscalers to not just rely on shared infrastructure to meet the demand but also control the connectivity layer. “This is like backward integration for cloud providers such as AWS. As cloud

Palo Alto Networks readies security for AI-first world

Palo Alto has articulated the value of a security platform for several years. But now, given the speed at which AI is moving, the value shifts from cost consolidation to agility. With AI, most customers don’t know what their future operating environment will look like, and a platform approach lets

Chevron executives see 2025 production growth nearing 8%

Executives of Chevron Corp., Houston, expect the company’s 2025 production growth, excluding former Hess operations, to be near the top of their guidance range of 6-8%, they said Oct. 31. Chevron’s total production for the 3 months that ended Sept. 30 totaled nearly 4.09 MMboe/d compared with 3.37 MMboe/d in

Cisco unveils integrated edge platform for AI

Announced at Cisco’s Partner Summit, Unified Edge will likely be part of many third-party packages that can be configured in a variety of ways, Cisco stated. “The platform is customer definable. For example, if a customer has a workload and they’ve decided they want to use Nutanix, they can go

Strategists Forecast 6MM Barrel WoW USA Crude Stock Build

In an oil and gas report sent to Rigzone this week by the Macquarie team, Macquarie strategists, including Walt Chancellor, revealed that they are forecasting that U.S. crude inventories will be up by 6.2 million barrels for the week ending October 31. “This follows a 6.9 million barrel draw in the prior week, with the crude balance realizing significantly tighter than our expectations,” the strategists said in the report. “For this week’s balance, from refineries, we model a moderate increase in crude runs (+0.4 million barrels per day),” they added. “Among net imports, we model a large increase, with exports lower (-0.6 million barrels per day) and imports higher (+0.8 million barrels per day) on a nominal basis,” they continued. In the report, the strategists noted that the timing of cargoes remains a source of potential volatility in this week’s crude balance. “From implied domestic supply (prod.+adj.+transfers), we look for a bounce (+0.8 million barrels per day) on a nominal basis this week,” the analysts went on to state in the report. “Rounding out the picture, we anticipate a similar increase (+0.5 million barrels) in SPR [Strategic Petroleum Reserve] stocks this week,” they noted. The strategists also said in the report that, “among products” they “look for draws in gasoline (-2.5 million barrels) and distillate (-4.7 million barrels), with jet stocks up (+0.8 million barrels)”. “We model implied demand for these three products at ~14.6 million barrels per day for the week ending October 31,” they added. In its latest weekly petroleum status report at the time of writing, which was released on October 29 and included data for the week ending October 24, the U.S. Energy Information Administration (EIA) highlighted that U.S. commercial crude oil inventories, excluding those in the SPR, decreased by 6.9 million barrels from the week

ADNOC Set to Join Argentina LNG

Abu Dhabi National Oil Co PJSC (ADNOC) signed Tuesday a “non-binding framework agreement” to invest in YPF SA and Eni SpA’s project to export up to 12 million metric tons per annum (MMtpa) of natural gas from the Vaca Muerta field onshore Argentina. ADNOC through its global investment arm XRG will “evaluate participation” in Argentina LNG, XRG said in an online statement. “By joining forces with Eni’s world-class FLNG [floating liquefied natural gas] capabilities and YPF’s proven upstream leadership, we aim to set new benchmarks for innovation, scale and reliability in the international gas market”, said XRG international president for gas Mohamed Al Aryani. Italy’s state-backed Eni said separately the agreement signed Tuesday at the ADIPEC energy forum in Abu Dhabi paves the way for a “joint development agreement”. Last month Eni and Argentina’s state-owned YPF signed a “final technical project description”, bringing Argentina LNG closer to a final investment decision. “The project involves the production, processing, transportation and liquefaction of gas for export through two floating gas liquefaction units with a capacity of six MTPA (million tons per year, equivalent to approximately 9 billion cubic meters of gas per year) each, in addition to the valorization and export of associated liquids”, Eni said in a press release October 10. “Today’s agreement follows the head of agreement signed by the two companies in June 2025”. Announcing its initial agreement with YPF, Eni said June 6 Argentina LNG has plans to expand to 30 MMtpa by 2030. XRG added, “The non-binding framework agreement, signed during ADIPEC 2025, follows XRG’s recent investments in Mozambique’s Rovuma Basin, Block-1 Turkmenistan, Arcius Energy in Egypt, Absheron in Azerbaijan and the Rio Grande LNG project in the United States, reinforcing its ambition to become a leading global gas player”. ADNOC’s Gas Ambitions XRG aims to build

Russia in Talks with Turkey to Maintain Gas Flows

Russia and Turkey are in talks to keep up the volumes of gas supplies from Gazprom PJSC as they negotiate the renewal of two major pipeline supply deals, according to people familiar with the matter. The contracts between Russia’s gas giant and Turkey’s state company Botas for combined deliveries of as much as 21.75 billion cubic meters a year are set to expire on Dec. 31. Russia and Turkey are negotiating to keep the annual flows at about 22 billion cubic meters, the people said, asking not to be identified as the information isn’t public. Gazprom didn’t immediately respond to a Bloomberg request for comment sent during a public holiday in Russia. Turkey’s Energy Ministry didn’t comment. Botas didn’t reply to a query seeking comment. Gas market watchers have been questioning the future of Russian gas flows to Turkey amid growing pressure from US President Donald Trump’s administration to curb energy purchases that help the Kremlin fund its war on Ukraine. Following US sanctions on Russia’s two biggest oil producers last month, Turkey’s oil refiners have started cutting imports of Russian crude. Turkey has previously pushed back on Western efforts to stop it from buying Russian gas, which is mostly traded through long-term contracts via extensive pipeline connections between the two countries. In September, however, Turkey agreed to a string of contracts to buy liquefied natural gas, including from the US. With Turkey’s own production from the Black Sea set to grow, it may end up with more gas than it needs. Turkey’s large market has been a lifeline for Gazprom, which has all but lost the European gas market after the war triggered a push for diversification of supplies. This should give Turkey leverage to negotiate discounts in a renewal of supply deals. Last year, Gazprom shipped 21.6 billion

‘Disappointing’ Results for Melbana at Cuban Well

Melbana Energy Ltd said Wednesday flow testing at the Amistad-2 well in Cuba’s onshore Block 9 had failed to recover oil. “The testing of Amistad-2 is disappointing given the well was up-dip of known oil, but this can occur in the early-stage appraisal and development of new oilfields”, Melbana executive chair Andrew Purcell said in an online statement. “Oil shows were muted during the drilling, perhaps because the reservoir drilling fluid we have designed for these formations was in balance and doing its job, but well logs indicated good reservoir quality and reasonable oil saturation. Flow testing confirmed excellent reservoir quality, given the high rate of fluid recovery, but oil was residual at that location. “The rate of drilling was also quicker than prognosed, allowing us to continue drilling the encountered formation much deeper than originally planned”. The Sydney, Australia-based company exceeded its target total depth of 1,125 meters (3,690.94 feet) and reached 2,000 meters. Amistad-2 sits about 850 meters southwest and 200 meters up-dip of the already producing Alameda-2, also in Block 9, according to Melbana. However, pressure data from the latest drilling campaign “indicates that the reservoirs at the Amistad-2 location are not in communication with those at the Alameda-2 location”, Wednesday’s statement said. “Given the results of Amistad-2 consideration is now being given to Amistad-11 replacing Amistad-3 as the next well. This would be a shallow production well located on Pad 1, where good production characteristics have previously been obtained (peak flow of 1,903 bopd at a sustained rate of 1,235 bopd)”, Melbana added. “Production operations in Amistad-1 have been temporarily halted to prepare for the drilling of this well in case the joint operation approves this course of action”. Block 9 spans 2,344 square kilometers (905.02 square miles) on the north coast of Cuba, 140 kilometers

Shell Commits to Long-Term Purchase from Ruwais LNG

Abu Dhabi National Oil Co PJSC (ADNOC) said Tuesday it has signed a 15-year deal with Shell PLC to supply the British company up to one million metric tons per annum (MMtpa) of liquefied natural gas (LNG) from the Ruwais LNG project in the United Arab Emirates. “Signed during ADIPEC, the deal marks ADNOC’s first long-term LNG sales agreement with Shell and the eighth long-term offtake agreement secured for the Ruwais LNG project”, ADNOC said in a press release. “This SPA [sale and purchase agreement] converts a previous heads of agreement into a definitive agreement and marks a significant step in ADNOC’s efforts to rapidly commercialize the Ruwais LNG project. “With this latest agreement, more than eight MMtpa of the project’s planned 9.6 MMtpa capacity is now secured through long-term deals with customers across Asia and Europe, just 16 months after the project’s final investment decision in July 2024”. Fatema Al Nuaimi, chief executive of ADNOC gas processing and sales arm ADNOC Gas PLC, said, “While the industry can take up to four or five years to market such volumes, Ruwais is advancing at record pace”. “In parallel, construction, contractor mobilization and site works are all on track for commissioning by the end of 2028”, Al Nuaimi added. The export plant in Al Ruwais Industrial City is planned to have two trains, each with a production capacity of 4.8 MMtpa. Targeted to be put into production 2028, the facility would more than double ADNOC’s LNG capacity. Shell already holds a 10 percent stake in the project through Shell Overseas Holdings Ltd, ADNOC confirmed Tuesday. Last year ADNOC penned separate agreements farming out a total of 40 percent in Ruwais LNG to Shell, BP PLC, Mitsui & Co Ltd and TotalEnergies SE. Japan’s Mitsui also penned an offtake of 600,000 metric tons a year,

Oil Retreats on Strong Greenback

Oil fell, halting a four-session run of gains, pressured by a strong dollar and a backdrop of oversupply. West Texas Intermediate fell 0.8% to settle below $61 a barrel on Tuesday. A global equities rally hit a speed bump amid concerns about lofty valuations while the greenback climbed to the highest in more than five months, weighing on crude and other dollar-denominated commodities. Oil declined because of “the dollar funding stress and the second-order effect on global liquidity and, in turn, global growth,” said Jon Byrne, an analyst at Strategas Securities. The Organization of the Petroleum Exporting Countries and its allies said over the weekend they planned to hold back from lifting production quotas in the first quarter. The decision came as market observers brace for what is expected to be a global crude glut. The US oil benchmark has retreated almost 16% this year as OPEC+ and non-member nations ramped up production. Prices rebounded from five-month lows when the US recently announced sanctions on Rosneft PJSC and Lukoil PJSC, Russia’s two biggest oil companies, but have since surrendered some of those advances. Russian seaborne crude shipments fell sharply in the wake of the sanctions, dropping by the most since January 2024, according to data tracked by Bloomberg. Cargo discharges have been hit even harder than loadings, with oil held in tanker ships surging. Still, some are skeptical the restrictions will stop Russian oil from finding buyers. “Down the line, you will see that more and more of the disrupted Russian oil, one way or another, finds its way to the market,” Torbjörn Törnqvist, chief executive officer of Gunvor Group, said during an interview on Tuesday. “It always does somehow.” Eni SpA CEO Claudio Descalzi said Monday that any concerns about oversupply will be short-lived, the latest comments by an

Cisco centralizes customer experience around AI

The idea is to make sure enterprises are effectively choosing, implementing, and using the technologies they purchase to achieve their business goals, according to the company. Cisco CX offers a suite of services to help customers optimize their network infrastructure, security, collaboration, cloud and data center operations – from planning and design to implementation and maintenance. “For too long, the delivery of services has been fragmented, with support and professional services using different tools optimized for specific functions or lifecycle stages. This has led to a fragmented experience where customers, partners, and Cisco teams spend more time on data collection and tool maintenance than on high-value analysis,” wrote Bhaskar Jayakrishnan, senior vice president of engineering with the Cisco CX group in a blog about the new technology. “Historically, the handoffs between these stages have been inefficient. Designs are interpreted by humans and then converted into code. Operational data is manually analyzed to inform optimizations. This process is slow, error-prone, and loses critical context at every step.” “Cisco IQ represents a shift from this tool-centric model to an intelligence-centric one. It is a multi-persona system, serving customers, partners, and our own services teams through an API-first architecture. Our objective is to turn decades of institutional knowledge into a living, adaptive system that makes your infrastructure smarter, more resilient, and more secure,” Jayakrishnan wrote.

Data Center Jobs: Engineering, Construction, Commissioning, Sales, Field Service and Facility Tech Jobs Available in Major Data Center Hotspots

Each month Data Center Frontier, in partnership with Pkaza, posts some of the hottest data center career opportunities in the market. Here’s a look at some of the latest data center jobs posted on the Data Center Frontier jobs board, powered by Pkaza Critical Facilities Recruiting. Looking for Data Center Candidates? Check out Pkaza’s Active Candidate / Featured Candidate Hotlist Data Center Facility Technician (All Shifts Available) Impact, TX This position is also available in: Ashburn, VA; Abilene, TX; Needham, MA and New York, NY. Navy Nuke / Military Vets leaving service accepted! This opportunity is working with a leading mission-critical data center provider. This firm provides data center solutions custom-fit to the requirements of their client’s mission-critical operational facilities. They provide reliability of mission-critical facilities for many of the world’s largest organizations facilities supporting enterprise clients, colo providers and hyperscale companies. This opportunity provides a career-growth minded role with exciting projects with leading-edge technology and innovation as well as competitive salaries and benefits. Electrical Commissioning Engineer Montvale, NJ This traveling position is also available in: New York, NY; White Plains, NY; Richmond, VA; Ashburn, VA; Charlotte, NC; Atlanta, GA; Hampton, GA; Fayetteville, GA; New Albany, OH; Cedar Rapids, IA; Phoenix, AZ; Dallas, TX or Chicago IL *** ALSO looking for a LEAD EE and ME CxA Agents and CxA PMs. *** Our client is an engineering design and commissioning company that has a national footprint and specializes in MEP critical facilities design. They provide design, commissioning, consulting and management expertise in the critical facilities space. They have a mindset to provide reliability, energy efficiency, sustainable design and LEED expertise when providing these consulting services for enterprise, colocation and hyperscale companies. This career-growth minded opportunity offers exciting projects with leading-edge technology and innovation as well as competitive salaries and benefits. Data Center MEP Construction

NVIDIA at GTC 2025: Building the AI Infrastructure of Everything

Omniverse DSX Blueprint Unveiled Also at the conference, NVIDIA released a blueprint for how other firms should build massive, gigascale AI data centers, or AI factories, in which Oracle, Microsoft, Google, and other leading tech firms are investing billions. The most powerful and efficient of those, company representatives said, will include NVIDIA chips and software. A new NVIDIA AI Factory Research Center in Virginia will use that technology. This new “mega” Omniverse DSX Blueprint is a comprehensive, open blueprint for designing and operating gigawatt-scale AI factories. It combines design, simulation, and operations across factory facilities, hardware, and software. • The blueprint expands to include libraries for building factory-scale digital twins, with Siemens’ Digital Twin software first to support the blueprint and FANUC and Foxconn Fii first to connect their robot models. • Belden, Caterpillar, Foxconn, Lucid Motors, Toyota, Taiwan Semiconductor Manufacturing Co. (TSMC), and Wistron build Omniverse factory digital twins to accelerate AI-driven manufacturing. • Agility Robotics, Amazon Robotics, Figure, and Skild AI build a collaborative robot workforce using NVIDIA’s three-computer architecture. NVIDIA Quantum Gains And then there’s quantum computing. It can help data centers become more energy-efficient and faster with specific tasks such as optimization and AI model training. Conversely, the unique infrastructure needs of quantum computers, such as power, cooling, and error correction, are driving the development of specialized quantum data centers. Huang said it’s now possible to make one logical qubit, or quantum bit, that’s coherent, stable, and error corrected. However, these qubits—the units of information enabling quantum computers to process information in ways ordinary computers can’t—are “incredibly fragile,” creating a need for powerful technology to do quantum error correction and infer the qubit’s state. To connect quantum and GPU computing, Huang announced the release of NVIDIA NVQLink — a quantum‑GPU interconnect that enables real‑time CUDA‑Q calls from quantum

The Evolution of the Neocloud: From Niche to Mainstream Hyperscale Challenger

Infrastructure and Supply Chain Race Cloud competition is increasingly defined by the ability to secure power, land, and chips— three resources that dictate project timelines and customer onboarding. Neoclouds and hyperscalers face a common set of constraints: local utility availability, substation interconnection bottlenecks, and fierce competition for high-density GPU inventory. Power stands as the gating factor for expansion, often outpacing even chip shortages in severity. Facilities are increasingly being sited based on access to dedicated, reliable megawatt-scale electricity, rather than traditional latency zones or network proximity. AI growth forecasts point to four key ceilings: electrical capacity, chip procurement cycles, latency wall between computation and data, and scalable data throughput for model training. With hyperscaler and neocloud deployments now competing for every available GPU from manufacturers, deployment agility has become a prime differentiator. Neoclouds distinguish themselves by orchestrating microgrid agreements, securing direct-source utility contracts, and compressing build-to-operational timelines. Converting a bare site to a functional data hall with operators that can viably offer a shortened deployment timeline gives neoclouds a material edge over traditional hyperscale deployments that require broader campus and network-level integration cycles. The aftereffects of the COVID era supply chain disruptions linger, with legacy operators struggling to source critical electrical components, switchgear, and transformers, sometimes waiting more than a year for equipment. As a result, neocloud providers have moved aggressively into site selection strategies, regional partnerships, and infrastructure stack integration to hedge risk and shorten delivery cycles. Microgrid solutions and island modes for power supply are increasingly utilized to ensure uninterrupted access to electricity during ramp-up periods and supply chain outages, fundamentally rebalancing the competitive dynamics of AI infrastructure deployment. Creditworthiness, Capital, and Risk Management Securing capital remains a decisive factor for the growth and sustainability of neoclouds. Project finance for campus-scale deployments hinges on demonstrable creditworthiness; lenders demand

Canyon Magnet Energy: The Superconducting Future of Powering AI Data Centers

At this year’s Data Center Frontier Trends Summit, Honghai Song, founder of Canyon Magnet Energy, presented his company’s breakthrough superconducting magnet technology during the “6 Moonshot Trends for the 2026 Data Center Frontier” panel—showcasing how high-temperature superconductors (HTS) could reshape both fusion energy and AI data-center power systems. In this episode of the Data Center Frontier Show, Editor in Chief Matt Vincent speaks with Song about how Canyon Magnet Energy—founded in 2023 and based in New Jersey with research roots at Stony Brook University—is bridging fusion research and AI infrastructure through next-generation magnet and energy-storage technology. From Fusion Research to Data Center Reality Founded in 2023, Canyon Magnet Energy emerged from the advanced-magnet research ecosystem around Stony Brook and now operates a manufacturing line in Newark, New Jersey. Its team draws on decades of experience designing the ultra-strong magnetic fields that enable the confinement and stability of fusion plasma—but their ambitions go far beyond the laboratory. “Super magnets are the foundation of fusion,” Song explains in the interview. “But the same high-temperature superconductors that can make fusion practical can also dramatically improve how we move and store electricity in data centers.” The company’s magnets are built using REBCO (Rare Earth Barium Copper Oxide) tape, which operates at around 77 Kelvin—cold, but far warmer and more manageable than traditional low-temperature superconductors. The result is a zero-resistance pathway for electricity, unlocking new possibilities in power transmission, energy storage, and grid integration. Why High-Temperature Superconductors Matter Since their discovery in 1986, high-temperature superconductors have progressed from exotic physics experiments to industrial-scale wire and magnet manufacturing. Canyon Magnet Energy is among a new generation of companies moving this technology into the AI data-center context—where efficiency and instantaneous power responsiveness are increasingly critical. With AI training clusters consuming power at hundreds of megawatts per campus,

OpenAI spends even more money it doesn’t have

The aim, said Gogia, “is continuity, not cost efficiency. These deals are forward leaning, relying on revenue forecasts that remain speculative. In that context, OpenAI must continue to draw heavily on outside capital, whether through venture rounds, debt, or a future public offering.” He pointed out, “the company’s recent legal and corporate restructuring was designed to open the doors to that capital. Removing Microsoft’s exclusivity makes room for more vendors but also signals that no one provider can meet OpenAI’s demands. In several cases, suppliers are stepping in with financing arrangements that link product sales to future performance. While these strategies help close funding gaps, they introduce fragility. What looks like revenue is often pre-paid consumption, not realized margin.” Execution risks, he said, add to the concern. “Building and energizing enough data centers to meet OpenAI’s projected needs is not a function of ambition alone. It requires grid access, cooling capacity, and regional stability. Microsoft has acknowledged that it lacks the power infrastructure to fully deploy the GPUs it owns. Without physical readiness, all of these agreements sit on shaky ground.” Lots of equity swapping going on Scott Bickley, advisory fellow at Info-Tech Research Group, said he has not only been astounded by the funding announcements over the last few months, but is also appalled, primarily, he said, “because of the disconnect to what this does to the underlying technology stocks and their market prices versus where the technology is at from a development and ROI perspective … and from a boots on the ground perspective.” He added that while the financial pledges involve “huge, staggering numbers, most of them are tied up in ways that are not necessarily going to require all the cash to come from OpenAI. In a lot of cases, there is equity swapping. You have

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE