Stay Ahead, Stay ONMINE

Nvidia says its Blackwell chips lead benchmarks in training AI LLMs

Nvidia is rolling out its AI chips to data centers and what it calls AI factories throughout the world, and the company announced today its Blackwell chips are leading the AI benchmarks. Nvidia and its partners are speeding the training and deployment of next-generation AI applications that use the latest advancements in training and inference. The Nvida Blackwell architecture is built to meet the heightened performance requirements of these new applications. In the latest round of MLPerf Training — the 12th since the benchmark’s introduction in 2018 — the Nvidia AI platform delivered the highest performance at scale on every benchmark and powered every result submitted on the benchmark’s toughest large language model (LLM)-focused test: Llama 3.1 405B pretraining. Nvidia touted its performance on MLPerf training benchmarks. The Nvidia platform was the only one that submitted results on every MLPerf Training v5.0 benchmark — underscoring its exceptional performance and versatility across a wide array of AI workloads, spanning LLMs, recommendation systems, multimodal LLMs, object detection and graph neural networks. The at-scale submissions used two AI supercomputers powered by the Nvidia Blackwell platform: Tyche, built using Nvidia GB200 NVL72 rack-scale systems, and Nyx, based on Nvidia DGX B200 systems. In addition, Nvidia collaborated with CoreWeave and IBM to submit GB200 NVL72 results using a total of 2,496 Blackwell GPUs and 1,248 Nvidia Grace CPUs. On the new Llama 3.1 405B pretraining benchmark, Blackwell delivered 2.2 times greater performance compared with previous-generation architecture at the same scale. Nvidia Blackwell is driving AI factories. On the Llama 2 70B LoRA fine-tuning benchmark, Nvidia DGX B200 systems, powered by eight Blackwell GPUs, delivered 2.5 times more performance compared with a submission using the same number of GPUs in the prior round. These performance leaps highlight advancements in the Blackwell architecture, including high-density liquid-cooled racks, 13.4TB of coherent memory per rack, fifth-generation Nvidia NVLink and Nvidia NVLink Switch interconnect technologies for scale-up and Nvidia Quantum-2 InfiniBand networking for scale-out. Plus, innovations in the Nvidia NeMo Framework software stack raise the bar for next-generation multimodal LLM training, critical for bringing agentic AI applications to market. These agentic AI-powered applications will one day run in AI factories — the engines of the agentic AI economy. These new applications will produce tokens and valuable intelligence that can be applied to almost every industry and academic domain. The Nvidia data center platform includes GPUs, CPUs, high-speed fabrics and networking, as well as a vast array of software like Nvidia CUDA-X libraries, the NeMo Framework, Nvidia TensorRT-LLM and Nvidia Dynamo. This highly tuned ensemble of hardware and software technologies empowers organizations to train and deploy models more quickly, dramatically accelerating time to value. Blackwell is handily beating its predecessor Hopper in AI training. The Nvidia partner ecosystem participated extensively in this MLPerf round. Beyond the submission with CoreWeave and IBM, other compelling submissions were from ASUS, Cisco, Giga Computing, Lambda, Lenovo Quanta Cloud Technology and Supermicro. First MLPerf Training submissions using GB200 were developed by MLCommons Association with more than 125 members and affiliates. Its time-to-train metric ensures training process produces a model that meets required accuracy. And its standardized benchmark run rules ensure apples-to-apples performance comparisons. The results are peer-reviewed before publication. The basics on training benchmarks Nvidia’s is getting great scaling on its latest AI processors. Dave Salvator is someone I knew when he was part of the tech press. Now he is director of accelerated computing products in the Accelerated Computing Group at Nvidia. In a press briefing, Salvator noted that Nvidia CEO Jensen Huang talks about this notion of the types of scaling laws for AI. They include pre training, where you’re basically teaching the AI model knowledge. That’s starting from zero. It’s a heavy computational lift that is the backbone of AI, Salvator said. From there, Nvidia moves into post-training scaling. This is where models kind of go to school, and this is a place where you can do things like fine tuning, for instance, where you bring in a different data set to teach a pre-trained model that’s been trained up to a point, to give it additional domain knowledge of your particular data set. Nvidia has moved on from just chips to building AI infrastructure. And then lastly, there is time-test scaling or reasoning, or sometimes called long thinking. The other term this goes by is agentic AI. It’s AI that can actually think and reason and problem solve, where you basically ask a question and get a relatively simple answer. Test time scaling and reasoning can actually work on much more complicated tasks and deliver rich analysis. And then there is also generative AI which can generate content on an as needed basis that can include text summarization translations, but then also visual content and even audio content. There are a lot of types of scaling that go on in the AI world. For the benchmarks, Nvidia focused on pre-training and post-training results. “That’s where AI begins what we call the investment phase of AI. And then when you get into inferencing and deploying those models and then generating basically those tokens, that’s where you begin to get your return on your investment in AI,” he said. The MLPerf benchmark is in its 12th round and it dates back to 2018. The consortium backing it has over 125 members and it’s been used for both inference and training tests. The industry sees the benchmarks as robust. “As I’m sure a lot of you are aware, sometimes performance claims in the world of AI can be a bit of the Wild West. MLPerf seeks to bring some order to that chaos,” Salvator said. “Everyone has to do the same amount of work. Everyone is held to the same standard in terms of convergence. And once results are submitted, those results are then reviewed and vetted by all the other submitters, and people can ask questions and even challenge results.” The most intuitive metric around training is how long does it take to train an AI model trained to what’s called convergence. That means hitting a specified level of accuracy right. It’s an apples-to-apples comparison, Salvator said, and it takes into account constantly changing workloads. This year, there’s a new Llama 3.140 5b workload, which replaces the ChatGPT 170 5b workload that was in the benchmark previously. In the benchmarks, Salvator noted Nvidia had a number of records. The Nvidia GB200 NVL72 AI factories are fresh from the fabrication factories. From one generation of chips (Hopper) to the next (Blackwell), Nvidia saw a 2.5 times improvement for image generation results. “We’re still fairly early in the Blackwell product life cycle, so we fully expect to be getting more performance over time from the Blackwell architecture, as we continue to refine our software optimizations and as new, frankly heavier workloads come into the market,” Salvator said. He noted Nvidia was the only company to have submitted entries for all benchmarks. “The great performance we’re achieving comes through a combination of things. It’s our fifth-gen NVLink and NVSwitch up delivering up to 2.66 times more performance, along with other just general architectural goodness in Blackwell, along with just our ongoing software optimizations that make that make that performance possible,” Salvator said. He added, “Because of Nvidia’s heritage, we have been known for the longest time as those GPU guys. We certainly make great GPUs, but we have gone from being just a chip company to not only being a system company with things like our DGX servers, to now building entire racks and data centers with things like our rack designs, which are now reference designs to help our partners get to market faster, to building entire data centers, which ultimately then build out entire infrastructure, which we then are now referring to as AI factories. It’s really been this really interesting journey.”

Nvidia is rolling out its AI chips to data centers and what it calls AI factories throughout the world, and the company announced today its Blackwell chips are leading the AI benchmarks.

Nvidia and its partners are speeding the training and deployment of next-generation AI applications that use the latest advancements in training and inference.

The Nvida Blackwell architecture is built to meet the heightened performance requirements of these new applications. In the latest round of MLPerf Training — the 12th since the benchmark’s introduction in 2018 — the Nvidia AI platform delivered the highest performance at scale on every benchmark and powered every result submitted on the benchmark’s toughest large language model (LLM)-focused test: Llama 3.1 405B pretraining.

Nvidia touted its performance on MLPerf training benchmarks.

The Nvidia platform was the only one that submitted results on every MLPerf Training v5.0 benchmark — underscoring its exceptional performance and versatility across a wide array of AI workloads, spanning LLMs, recommendation systems, multimodal LLMs, object detection and graph neural networks.

The at-scale submissions used two AI supercomputers powered by the Nvidia Blackwell platform: Tyche, built using Nvidia GB200 NVL72 rack-scale systems, and Nyx, based on Nvidia DGX B200 systems. In addition, Nvidia collaborated with CoreWeave and IBM to submit GB200 NVL72 results using a total of 2,496 Blackwell GPUs and 1,248 Nvidia Grace CPUs.

On the new Llama 3.1 405B pretraining benchmark, Blackwell delivered 2.2 times greater performance compared with previous-generation architecture at the same scale.

Nvidia Blackwell is driving AI factories.

On the Llama 2 70B LoRA fine-tuning benchmark, Nvidia DGX B200 systems, powered by eight Blackwell GPUs, delivered 2.5 times more performance compared with a submission using the same number of GPUs in the prior round.

These performance leaps highlight advancements in the Blackwell architecture, including high-density liquid-cooled racks, 13.4TB of coherent memory per rack, fifth-generation Nvidia NVLink and Nvidia NVLink Switch interconnect technologies for scale-up and Nvidia Quantum-2 InfiniBand networking for scale-out. Plus, innovations in the Nvidia NeMo Framework software stack raise the bar for next-generation multimodal LLM training, critical for bringing agentic AI applications to market.

These agentic AI-powered applications will one day run in AI factories — the engines of the agentic AI economy. These new applications will produce tokens and valuable intelligence that can be applied to almost every industry and academic domain.

The Nvidia data center platform includes GPUs, CPUs, high-speed fabrics and networking, as well as a vast array of software like Nvidia CUDA-X libraries, the NeMo Framework, Nvidia TensorRT-LLM and Nvidia Dynamo. This highly tuned ensemble of hardware and software technologies empowers organizations to train and deploy models more quickly, dramatically accelerating time to value.

Blackwell is handily beating its predecessor Hopper in training.
Blackwell is handily beating its predecessor Hopper in AI training.

The Nvidia partner ecosystem participated extensively in this MLPerf round. Beyond the submission with CoreWeave and IBM, other compelling submissions were from ASUS, Cisco, Giga Computing, Lambda, Lenovo Quanta Cloud Technology and Supermicro.

First MLPerf Training submissions using GB200 were developed by MLCommons Association with more than 125 members and affiliates. Its time-to-train metric ensures training process produces a model that meets required accuracy. And its standardized benchmark run rules ensure apples-to-apples performance comparisons. The results are peer-reviewed before publication.

The basics on training benchmarks

Nvidia’s is getting great scaling on its latest AI processors.

Dave Salvator is someone I knew when he was part of the tech press. Now he is director of accelerated computing products in the Accelerated Computing Group at Nvidia. In a press briefing, Salvator noted that Nvidia CEO Jensen Huang talks about this notion of the types of scaling laws for AI. They include pre training, where you’re basically teaching the AI model knowledge. That’s starting from zero. It’s a heavy computational lift that is the backbone of AI, Salvator said.

From there, Nvidia moves into post-training scaling. This is where models kind of go to school, and this is a place where you can do things like fine tuning, for instance, where you bring in a different data set to teach a pre-trained model that’s been trained up to a point, to give it additional domain knowledge of your particular data set.

Nvidia has moved on from just chips to building AI infrastructure.

And then lastly, there is time-test scaling or reasoning, or sometimes called long thinking. The other term this goes by is agentic AI. It’s AI that can actually think and reason and problem solve, where you basically ask a question and get a relatively simple answer. Test time scaling and reasoning can actually work on much more complicated tasks and deliver rich analysis.

And then there is also generative AI which can generate content on an as needed basis that can include text summarization translations, but then also visual content and even audio content. There are a lot of types of scaling that go on in the AI world. For the benchmarks, Nvidia focused on pre-training and post-training results.

“That’s where AI begins what we call the investment phase of AI. And then when you get into inferencing and deploying those models and then generating basically those tokens, that’s where you begin to get your return on your investment in AI,” he said.

The MLPerf benchmark is in its 12th round and it dates back to 2018. The consortium backing it has over 125 members and it’s been used for both inference and training tests. The industry sees the benchmarks as robust.

“As I’m sure a lot of you are aware, sometimes performance claims in the world of AI can be a bit of the Wild West. MLPerf seeks to bring some order to that chaos,” Salvator said. “Everyone has to do the same amount of work. Everyone is held to the same standard in terms of convergence. And once results are submitted, those results are then reviewed and vetted by all the other submitters, and people can ask questions and even challenge results.”

The most intuitive metric around training is how long does it take to train an AI model trained to what’s called convergence. That means hitting a specified level of accuracy right. It’s an apples-to-apples comparison, Salvator said, and it takes into account constantly changing workloads.

This year, there’s a new Llama 3.140 5b workload, which replaces the ChatGPT 170 5b workload that was in the benchmark previously. In the benchmarks, Salvator noted Nvidia had a number of records. The Nvidia GB200 NVL72 AI factories are fresh from the fabrication factories. From one generation of chips (Hopper) to the next (Blackwell), Nvidia saw a 2.5 times improvement for image generation results.

“We’re still fairly early in the Blackwell product life cycle, so we fully expect to be getting more performance over time from the Blackwell architecture, as we continue to refine our software optimizations and as new, frankly heavier workloads come into the market,” Salvator said.

He noted Nvidia was the only company to have submitted entries for all benchmarks.

“The great performance we’re achieving comes through a combination of things. It’s our fifth-gen NVLink and NVSwitch up delivering up to 2.66 times more performance, along with other just general architectural goodness in Blackwell, along with just our ongoing software optimizations that make that make that performance possible,” Salvator said.

He added, “Because of Nvidia’s heritage, we have been known for the longest time as those GPU guys. We certainly make great GPUs, but we have gone from being just a chip company to not only being a system company with things like our DGX servers, to now building entire racks and data centers with things like our rack designs, which are now reference designs to help our partners get to market faster, to building entire data centers, which ultimately then build out entire infrastructure, which we then are now referring to as AI factories. It’s really been this really interesting journey.”

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

ExxonMobil bumps up 2030 target for Permian production

ExxonMobil Corp., Houston, is looking to grow production in the Permian basin to about 2.5 MMboe/d by 2030, an increase of 200,000 boe/d from executives’ previous forecasts and a jump of more than 45% from this year’s output. Helping drive that higher target is an expected 2030 cost profile that

Read More »

OPEC Data Points to Balanced Global Oil Market in 2026

OPEC kept forecasts for global oil supplies and demand in 2026 steady, pointing to a balanced world market that clashes with widespread predictions of a surplus. The Organization of the Petroleum Exporting Countries and its allies will need to produce an average of 43 million barrels a day next year to balance supply and demand, roughly in line with the amount pumped last month, according to a report on OPEC’s website. This runs counter to prevailing industry expectations for a supply excess in 2026. Top trader Trafigura Group said this week it could amount to a “super glut,” and the International Energy Agency — while paring its projections in its report earlier Thursday — continues to expect a record overhang. Key OPEC+ nations led by Saudi Arabia acknowledged the fragile backdrop last month by agreeing to pause further output increases during the first quarter after rapidly ramping up production earlier this year.  The outlook from OPEC’s Vienna-based secretariat has proven excessively bullish in recent years. Last year, OPEC was ultimately forced to slash demand projections by 32% over the course of six monthly downgrades. In late 2023, it forecast a record inventory deficit that never materialized. WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed.

Read More »

Antero adds to Marcellus portfolio, Infinity picks up divested Ohio Utica interests

Antero Resources Corp., Denver, Co., has signed deals to expand its Marcellus shale footprint in West Virginia and to divest its certain Ohio Utica shale assets. Adding the Marcellus assets expands Antero Resources’ core acreage position, enhancing its position “as the premier liquids developer in the Marcellus,” and provides the company “with further dry gas optionality for local demand from data centers and natural gas fired power plants,” said Michael Kennedy, president and chief executive officer, in a release Dec. 8. Marcellus acquisition from HG Energy Through a deal to acquire the upstream assets of HG Energy II LLC, Parkersburg, WV, Antero aims to add 850 MMcfed of expected Marcellus production in 2026. The deal, expected to close in second-quarter 2026, was signed for $2.8 billion in cash plus the assumption of HG Energy’s commodity hedge book. Antero said about 90% of HG natural gas production is hedged in 2026 and 2027 at average NYMEX prices of $4.00 and $3.88, respectively. The deal adds 385,000 net acres offsetting Antero’s existing 475,000 net core Marcellus acreage position and includes over 400 additional locations that immediately compete for capital (75% liquids), the company said in a related investor presentation.  Antero said it anticipates capital synergies of about $550 million inclusive of development planning optimization and drilling and completions savings. Another $400 in income-related synergies is expected. Separately, Antero Midstream agreed to acquire the midstream assets from HG Energy for $1.1 billion in cash. The deal includes about 50 miles of bi-directional dry and rich gas gathering pipelines and water assets in which Antero plans to invest about $25 million to integrate with its legacy gathering and water system. Utica sale to Infinity Natural Resources Infinity Natural Resources Inc., in a release Dec. 8, said subsidiary Infinity Natural Resources LLC will acquire upstream and

Read More »

Market Focus: Oversupply takes center stage, fundamentals catch up with the market

@import url(‘https://fonts.googleapis.com/css2?family=Inter:[email protected]&display=swap’); a { color: var(–color-primary-main); } .ebm-page__main h1, .ebm-page__main h2, .ebm-page__main h3, .ebm-page__main h4, .ebm-page__main h5, .ebm-page__main h6 { font-family: Inter; } body { line-height: 150%; letter-spacing: 0.025em; font-family: Inter; } button, .ebm-button-wrapper { font-family: Inter; } .label-style { text-transform: uppercase; color: var(–color-grey); font-weight: 600; font-size: 0.75rem; } .caption-style { font-size: 0.75rem; opacity: .6; } #onetrust-pc-sdk [id*=btn-handler], #onetrust-pc-sdk [class*=btn-handler] { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-policy a, #onetrust-pc-sdk a, #ot-pc-content a { color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-pc-sdk .ot-active-menu { border-color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-accept-btn-handler, #onetrust-banner-sdk #onetrust-reject-all-handler, #onetrust-consent-sdk #onetrust-pc-btn-handler.cookie-setting-link { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-consent-sdk .onetrust-pc-btn-handler { color: #c19a06 !important; border-color: #c19a06 !important; } <!–> In this Market Focus episode of the Oil & Gas Journal ReEnterprised podcast, Conglin Xu, managing editor, economics, takes a look at the growing oversupply in global crude markets and the shift now under way as fundamentals begin overtaking sentiment and geopolitics as the primary price driver. ]–>

Read More »

Aramco, ExxonMobil weigh new chemical complex for Samref refinery

Saudi Aramco and partner ExxonMobil Corp. subsidiary Mobil Yanbu Refining Co. Inc. are discussing the possibility of executing a major overhaul and expansion of 50-50 joint venture Saudi Aramco-Mobil Refinery Co. Ltd.’s (Samref) 400,000-b/d Samref refinery in Yanbu, Saudi Arabia. As part of a venture framework agreement (VFA) signed on Dec. 8, the partners will evaluate potential capital investments to expand and diversify the refinery’s existing production slate, including the addition of a grassroots petrochemical complex at the site, Aramco said in a statement. In addition to upgrading and diversifying Samref’s production to include lower-emission, high-quality distillates and high-performance chemicals, the project scope would involve works to improve the refinery’s energy efficiency and implement a sitewide integrated emissions reduction strategy, according to Aramco. With the VFA now signed, the companies said they will begin the project’s preliminary front-end engineering and design (pre-FEED) study, which will focus on opportunities to maximize the site’s operational advantage and enhance its competitiveness while meeting Saudi Arabia’s growing demand for high-quality petrochemical products. For Aramco, the proposed project—the design of which aims to increase the conversion of crude oil and other petroleum liquids into higher-value chemicals—further reinforces the company’s commitment to creating further value of its overall downstream business as well as its liquids-to-chemicals strategy, according to Mohammed Y. Al Qahtani, Aramco’s downstream president. “[The proposed expansion and integration project] will also position Samref as a key driver in the growth of [Saudi Arabia’s] petrochemical sector,” Al Qahtani added. Without disclosing a timeline as to when the partners expect to complete the pre-FEED study or reach final investment decision, Aramco confirmed existing plans for the potential project would remain subject to market conditions and necessary regulatory approvals. Samref previously completed modifications and renovations at the Yanbu refinery in 2014-15 related to a two-phased clean-fuels project

Read More »

Harbour Energy to add North Sea assets through Waldorf acquisition

@import url(‘https://fonts.googleapis.com/css2?family=Inter:[email protected]&display=swap’); a { color: var(–color-primary-main); } .ebm-page__main h1, .ebm-page__main h2, .ebm-page__main h3, .ebm-page__main h4, .ebm-page__main h5, .ebm-page__main h6 { font-family: Inter; } body { line-height: 150%; letter-spacing: 0.025em; font-family: Inter; } button, .ebm-button-wrapper { font-family: Inter; } .label-style { text-transform: uppercase; color: var(–color-grey); font-weight: 600; font-size: 0.75rem; } .caption-style { font-size: 0.75rem; opacity: .6; } #onetrust-pc-sdk [id*=btn-handler], #onetrust-pc-sdk [class*=btn-handler] { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-policy a, #onetrust-pc-sdk a, #ot-pc-content a { color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-pc-sdk .ot-active-menu { border-color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-accept-btn-handler, #onetrust-banner-sdk #onetrust-reject-all-handler, #onetrust-consent-sdk #onetrust-pc-btn-handler.cookie-setting-link { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-consent-sdk .onetrust-pc-btn-handler { color: #c19a06 !important; border-color: #c19a06 !important; } Harbour Energy plc has agreed to acquire substantially all the subsidiaries of Waldorf Energy Partners Ltd. and Waldorf Production Ltd., currently in administration, for $170 million. The company, in a release Dec. 12, said the deal would add oil-weighted production of 20,000 boe/d and 2P reserves of 35 MMboe. In addition, the deal would increase Harbour’s interest in its operated Catcher oil and gas field to 90% from 50% and provide a new production base  for Harbour in the northern North Sea with the addition of a 29.5% non-operated interest in the EnQuest plc-operated Kraken oil field. The deal is expected to close in second-quarter 2026, subject to regulatory approvals and full and final settlement of all creditor claims against Waldorf’s subsidiaries.

Read More »

EIA: US oil inventories drop 1.8 million bbl

US commercial crude inventories for the week ended Dec. 5, excluding those in the Strategic Petroleum Reserve, dropped 1.8 million bbl from the previous week to 425.7 million bbl, which is about 4% below the average range for this time of year, according to the US Energy Information Administration’s (EIA) Weekly Petroleum Status Report. Total motor gasoline inventories gained 6.4 million bbl last week and are about 1% below the 5-year average range for this time of year. Finished gasoline inventories and blending components inventories rose. Distillate fuel inventories increased by 2.5 million bbl but are 7% below the 5-year average for this time of year. EIA reported that US crude refinery inputs last week averaged 16.9 million b/d, down 17,000 b/d from the previous week’s average. Refineries operated at 94.5% of their operable capacity. Gasoline production decreased to 9.6 million b/d, while distillate fuel production increased by 380,000 b/d, averaging 5.4 million b/d. US crude imports averaged 6.6 million b/d, up 609,000 b/d from the previous week’s average. Over the last 4 weeks, crude imports averaged 6.2 million b/d, down 7.7% from the same 4-week period last year. Total motor gasoline imports, including both finished gasoline and gasoline blending components, averaged 659,000 b/d. Distillate fuel imports averaged 181,000 b/d last week.

Read More »

Executive Roundtable: Converging Disciplines in the AI Buildout

At Data Center Frontier, we rely on industry leaders to help us understand the most urgent challenges facing digital infrastructure. And in the fourth quarter of 2025, the data center industry is adjusting to a new kind of complexity.  AI-scale infrastructure is redefining what “mission critical” means, from megawatt density and modular delivery to the chemistry of cooling fluids and the automation of energy systems. Every project has arguably in effect now become an ecosystem challenge, demanding that electrical, mechanical, construction, and environmental disciplines act as one.  For this quarter’s Executive Roundtable, DCF convened subject matter experts from Ecolab, EdgeConneX, Rehlko and Schneider Electric – leaders spanning the full chain of facilities design, deployment, and operation. Their insights illuminate how liquid cooling, energy management, and sustainable process design in data centers are now converging to set the pace for the AI era. Our distinguished executive panelists for this quarter include: Rob Lowe, Director RD&E – Global High Tech, Ecolab Phillip Marangella, Chief Marketing and Product Officer, EdgeConneX Ben Rapp, Manager, Strategic Project Development, Rehlko Joe Reele, Vice President, Datacenter Solution Architects, Schneider Electric Today: Engineering the New Normal – Liquid Cooling at Scale  Today’s kickoff article grapples with how, as liquid cooling technology transitions to default hyperscale design, the challenge is no longer if, but how to scale builds safely, repeatably, and globally.  Cold plates, immersion, dielectric fluids, and liquid-to-chip loops are converging into factory-integrated building blocks, yet variability in chemistry, serviceability, materials, commissioning practices, and long-term maintenance threatens to fragment adoption just as demand accelerates.  Success now hinges on shared standards and tighter collaboration across OEMs, builders, and process specialists worldwide. So how do developers coordinate across the ecosystem to make liquid cooling a safe, maintainable global default? What’s Ahead in the Roundtable Over the coming days, our panel

Read More »

DCF Trends Summit 2025: AI for Good – How Operators, Vendors and Cooling Specialists See the Next Phase of AI Data Centers

At the 2025 Data Center Frontier Trends Summit (Aug. 26-28) in Reston, Va., the conversation around AI and infrastructure moved well past the hype. In a panel sponsored by Schneider Electric—“AI for Good: Building for AI Workloads and Using AI for Smarter Data Centers”—three industry leaders explored what it really means to design, cool and operate the new class of AI “factories,” while also turning AI inward to run those facilities more intelligently. Moderated by Data Center Frontier Editor in Chief Matt Vincent, the session brought together: Steve Carlini, VP, Innovation and Data Center Energy Management Business, Schneider Electric Sudhir Kalra, Chief Data Center Operations Officer, Compass Datacenters Andrew Whitmore, VP of Sales, Motivair Together, they traced both sides of the “AI for Good” equation: building for AI workloads at densities that would have sounded impossible just a few years ago, and using AI itself to reduce risk, improve efficiency and minimize environmental impact. From Bubble Talk to “AI Factories” Carlini opened by acknowledging the volatility surrounding AI investments, citing recent headlines and even Sam Altman’s public use of the word “bubble” to describe the current phase of exuberance. “It’s moving at an incredible pace,” Carlini noted, pointing out that roughly half of all VC money this year has flowed into AI, with more already spent than in all of the previous year. Not every investor will win, he said, and some companies pouring in hundreds of billions may not recoup their capital. But for infrastructure, the signal is clear: the trajectory is up and to the right. GPU generations are cycling faster than ever. Densities are climbing from high double-digits per rack toward hundreds of kilowatts. The hyperscale “AI factories,” as NVIDIA calls them, are scaling to campus capacities measured in gigawatts. Carlini reminded the audience that in 2024,

Read More »

FinOps Foundation sharpens FOCUS to reduce cloud cost chaos

“The big change that’s really started to happen in late 2024 early 2025 is that the FinOps practice started to expand past the cloud,” Storment said. “A lot of organizations got really good at using FinOps to manage the value of cloud, and then their organizations went, ‘oh, hey, we’re living in this happily hybrid state now where we’ve got cloud, SaaS, data center. Can you also apply the FinOps practice to our SaaS? Or can you apply it to our Snowflake? Can you apply it to our data center?’” The FinOps Foundation’s community has grown to approximately 100,000 practitioners. The organization now includes major cloud vendors, hardware providers like Nvidia and AMD, data center operators and data cloud platforms like Snowflake and Databricks. Some 96 of the Fortune 100 now participate in FinOps Foundation programs. The practice itself has shifted in two directions. It has moved left into earlier architectural and design processes, becoming more proactive rather than reactive. It has also moved up organizationally, from director-level cloud management roles to SVP and COO positions managing converged technology portfolios spanning multiple infrastructure types. This expansion has driven the evolution of FOCUS beyond its original cloud billing focus. Enterprises are implementing FOCUS as an internal standard for chargeback reporting even when their providers don’t generate native FOCUS data. Some newer cloud providers, particularly those focused on AI infrastructure, are using the FOCUS specification to define their billing data structures from the ground up rather than retrofitting existing systems. The FOCUS 1.3 release reflects this maturation, addressing technical gaps that have emerged as organizations apply cost management practices across increasingly complex hybrid environments. FOCUS 1.3 exposes cost allocation logic for shared infrastructure The most significant technical enhancement in FOCUS 1.3 addresses a gap in how shared infrastructure costs are allocated and

Read More »

Aetherflux joins the race to launch orbital data centers by 2027

Enterprises will connect to and manage orbital workloads “the same way they manage cloud workloads today,” using optical links, the spokesperson added. The company’s approach is to “continuously launch new hardware and quickly integrate the latest architectures,” with older systems running lower-priority tasks to serve out the full useful lifetime of their high-end GPUs. The company declined to disclose pricing. Aetherflux plans to launch about 30 satellites at a time on SpaceX Falcon 9 rockets. Before the data center launch, the company will launch a power-beaming demonstration satellite in 2026 to test transmission of one kilowatt of energy from orbit to ground stations, using infrared lasers. Competition in the sector has intensified in recent months. In November, Starcloud launched its Starcloud-1 satellite carrying an Nvidia H100 GPU, which is 100 times more powerful than any previous GPU flown in space, according to the company, and demonstrated running Google’s Gemma AI model in orbit. In the same month, Google announced Project Suncatcher, with a 2027 demonstration mission planned. Analysts see limited near-term applications Despite the competitive activity, orbital data centers won’t replace terrestrial cloud regions for general hosting through 2030, said Ashish Banerjee, senior principal analyst at Gartner. Instead, they suit specific workloads, including meeting data sovereignty requirements for jurisdictionally complex scenarios, offering disaster recovery immune to terrestrial risks, and providing asynchronous high-performance computing, he said. “Orbital centers are ideal for high-compute, low-I/O batch jobs,” Banerjee said. “Think molecular folding simulations for pharma, massive Monte Carlo financial simulations, or training specific AI model weights. If the job takes 48 hours, the 500ms latency penalty of LEO is irrelevant.” One immediate application involves processing satellite-generated data in orbit, he said. Earth observation satellites using synthetic aperture radar generate roughly 10 gigabytes per second, but limited downlink bandwidth creates bottlenecks. Processing data in

Read More »

Here’s what Oracle’s soaring infrastructure spend could mean for enterprises

He said he had earlier told analysts in a separate call that margins for AI workloads in these data centers would be in the 30% to 40% range over the life of a customer contract. Kehring reassured that there would be demand for the data centers when they were completed, pointing to Oracle’s increasing remaining performance obligations, or services contracted but not yet delivered, up $68 billion on the previous quarter, saying that Oracle has been seeing unprecedented demand for AI workloads driven by the likes of Meta and Nvidia. Rising debt and margin risks raise flags for CIOs For analysts, though, the swelling debt load is hard to dismiss, even with Oracle’s attempts to de-risk its spend and squeeze more efficiency out of its buildouts. Gogia sees Oracle already under pressure, with the financial ecosystem around the company pricing the risk — one of the largest debts in corporate history, crossing $100 billion even before the capex spend this quarter — evident in the rising cost of insuring the debt and the shift in credit outlook. “The combination of heavy capex, negative free cash flow, increasing financing cost and long-dated revenue commitments forms a structural pressure that will invariably finds its way into the commercial posture of the vendor,” Gogia said, hinting at an “eventual” increase in pricing of the company’s offerings. He was equally unconvinced by Magouyrk’s assurances about the margin profile of AI workloads as he believes that AI infrastructure, particularly GPU-heavy clusters, delivers significantly lower margins in the early years because utilisation takes time to ramp.

Read More »

New Nvidia software gives data centers deeper visibility into GPU thermals and reliability

Addressing the challenge Modern AI accelerators now draw more than 700W per GPU, and multi-GPU nodes can reach 6kW, creating concentrated heat zones, rapid power swings, and a higher risk of interconnect degradation in dense racks, according to Manish Rawat, semiconductor analyst at TechInsights. Traditional cooling methods and static power planning increasingly struggle to keep pace with these loads. “Rich vendor telemetry covering real-time power draw, bandwidth behavior, interconnect health, and airflow patterns shifts operators from reactive monitoring to proactive design,” Rawat said. “It enables thermally aware workload placement, faster adoption of liquid or hybrid cooling, and smarter network layouts that reduce heat-dense traffic clusters.” Rawat added that the software’s fleet-level configuration insights can also help operators catch silent errors caused by mismatched firmware or driver versions. This can improve training reproducibility and strengthen overall fleet stability. “Real-time error and interconnect health data also significantly accelerates root-cause analysis, reducing MTTR and minimizing cluster fragmentation,” Rawat said. These operational pressures can shape budget decisions and infrastructure strategy at the enterprise level.

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »