Introducing Gemma 3n: The developer guide

Stay Ahead, Stay ONMINE

Introducing Gemma 3n: The developer guide

The first Gemma model launched early last year and has since grown into a thriving Gemmaverse of over 160 million collective downloads. This ecosystem includes our family of over a dozen specialized models for everything from safeguarding to medical applications and, most inspiringly, the countless innovations from the community. From innovators like Roboflow building enterprise computer vision to the Institute of Science Tokyo creating highly-capable Japanese Gemma variants, your work has shown us the path forward.Building on this incredible momentum, we’re excited to announce the full release of Gemma 3n. While last month’s preview offered a glimpse, today unlocks the full power of this mobile-first architecture. Gemma 3n is designed for the developer community that helped shape Gemma. It’s supported by your favorite tools including Hugging Face Transformers, llama.cpp, Google AI Edge, Ollama, MLX, and many others, enabling you to fine-tune and deploy for your specific on-device applications with ease. This post is the developer deep dive: we’ll explore some of the innovations behind Gemma 3n, share new benchmark results, and show you how to start building today.What’s new in Gemma 3n?Gemma 3n represents a major advancement for on-device AI, bringing powerful multimodal capabilities to edge devices with performance previously only seen in last year’s cloud-based frontier models. Achieving this leap in on-device performance required rethinking the model from the ground up. The foundation is Gemma 3n’s unique mobile-first architecture, and it all starts with MatFormer.MatFormer: One model, many sizesAt the core of Gemma 3n is the MatFormer (🪆Matryoshka Transformer) architecture, a novel nested transformer built for elastic inference. Think of it like Matryoshka dolls: a larger model contains smaller, fully functional versions of itself. This approach extends the concept of Matryoshka Representation Learning from just embeddings to all transformer components. During the MatFormer training of the 4B effective parameter (E4B) model, a 2B effective parameter (E2B) sub-model is simultaneously optimized within it, as shown in the figure above. This provides developers two powerful capabilities and use cases today:1: Pre-extracted models: You can directly download and use either the main E4B model for the highest capabilities, or the standalone E2B sub-model which we have already extracted for you, offering up to 2x faster inference.2: Custom sizes with Mix-n-Match: For more granular control tailored to specific hardware constraints, you can create a spectrum of custom-sized models between E2B and E4B using a method we call Mix-n-Match. This technique allows you to precisely slice the E4B model’s parameters, primarily by adjusting the feed forward network hidden dimension per layer (from 8192 to 16384) and selectively skipping some layers. We are releasing the MatFormer Lab, a tool that shows how to retrieve these optimal models, which were identified by evaluating various settings on benchmarks like MMLU. MMLU scores for the pre-trained Gemma 3n checkpoints at different model sizes (using Mix-n-Match) Looking ahead, the MatFormer architecture also paves the way for elastic execution. While not part of today’s launched implementations, this capability allows a single deployed E4B model to dynamically switch between E4B and E2B inference paths on the fly, enabling real-time optimization of performance and memory usage based on the current task and device load.Per-Layer Embeddings (PLE): Unlocking more memory efficiencyGemma 3n models incorporate Per-Layer Embeddings (PLE). This innovation is tailored for on-device deployment as it dramatically improves model quality without increasing the high-speed memory footprint required on your device’s accelerator (GPU/TPU).While the Gemma 3n E2B and E4B models have a total parameter count of 5B and 8B respectively, PLE allows a significant portion of these parameters (the embeddings associated with each layer) to be loaded and computed efficiently on the CPU. This means only the core transformer weights (approximately 2B for E2B and 4B for E4B) need to sit in the typically more constrained accelerator memory (VRAM). With Per-Layer Embeddings, you can use Gemma 3n E2B while only having ~2B parameters loaded in your accelerator. KV Cache sharing: Faster long-context processingProcessing long inputs, such as the sequences derived from audio and video streams, is essential for many advanced on-device multimodal applications. Gemma 3n introduces KV Cache Sharing, a feature designed to significantly accelerate time-to-first-token for streaming response applications.KV Cache Sharing optimizes how the model handles the initial input processing stage (often called the “prefill” phase). The keys and values of the middle layer from local and global attention are directly shared with all the top layers, delivering a notable 2x improvement on prefill performance compared to Gemma 3 4B. This means the model can ingest and understand lengthy prompt sequences much faster than before.Audio understanding: Introducing speech to text and translationGemma 3n uses an advanced audio encoder based on the Universal Speech Model (USM). The encoder generates a token for every 160ms of audio (about 6 tokens per second), which are then integrated as input to the language model, providing a granular representation of the sound context.This integrated audio capability unlocks key features for on-device development, including:Automatic Speech Recognition (ASR): Enable high-quality speech-to-text transcription directly on the device.Automatic Speech Translation (AST): Translate spoken language into text in another language.We’ve observed particularly strong AST results for translation between English and Spanish, French, Italian, and Portuguese, offering great potential for developers targeting applications in these languages. For tasks like speech translation, leveraging Chain-of-Thought prompting can significantly enhance results. Here’s an example: user Transcribe the following speech segment in Spanish, then translate it into English: model Plain text At launch time, the Gemma 3n encoder is implemented to process audio clips up to 30 seconds. However, this is not a fundamental limitation. The underlying audio encoder is a streaming encoder, capable of processing arbitrarily long audios with additional long form audio training. Follow-up implementations will unlock low-latency, long streaming applications.MobileNet-V5: New state-of-the-art vision encoderAlongside its integrated audio capabilities, Gemma 3n features a new, highly efficient vision encoder, MobileNet-V5-300M, delivering state-of-the-art performance for multimodal tasks on edge devices.Designed for flexibility and power on constrained hardware, MobileNet-V5 gives developers:Multiple input resolutions: Natively supports resolutions of 256×256, 512×512, and 768×768 pixels, allowing you to balance performance and detail for your specific applications.Broad visual understanding: Co-trained on extensive multimodal datasets, it excels at a wide range of image and video comprehension tasks.High throughput: Processes up to 60 frames per second on a Google Pixel, enabling real-time, on-device video analysis and interactive experiences.This level of performance is achieved with multiple architectural innovations, including:An advanced foundation of MobileNet-V4 blocks (including Universal Inverted Bottlenecks and Mobile MQA).A significantly scaled up architecture, featuring a hybrid, deep pyramid model that is 10x larger than the biggest MobileNet-V4 variant.A novel Multi-Scale Fusion VLM adapter that enhances the quality of tokens for better accuracy and efficiency.Benefiting from novel architectural designs and advanced distillation techniques, MobileNet-V5-300M substantially outperforms the baseline SoViT in Gemma 3 (trained with SigLip, no distillation). On a Google Pixel Edge TPU, it delivers a 13x speedup with quantization (6.5x without), requires 46% fewer parameters, and has a 4x smaller memory footprint, all while providing significantly higher accuracy on vision-language tasksWe’re excited to share more about the work behind this model. Look out for our upcoming MobileNet-V5 technical report, which will deep dive into the model architecture, data scaling strategies, and advanced distillation techniques.Making Gemma 3n accessible from day one has been a priority. We’re proud to partner with many incredible open source developers to ensure broad support across popular tools and platforms, including contributions from teams behind AMD, Axolotl, Docker, Hugging Face, llama.cpp, LMStudio, MLX, NVIDIA, Ollama, RedHat, SGLang, Unsloth, and vLLM.But this ecosystem is just the beginning. The true power of this technology is in what you will build with it. That’s why we’re launching the Gemma 3n Impact Challenge. Your mission: use Gemma 3n’s unique on-device, offline, and multimodal capabilities to build a product for a better world. With $150,000 in prizes, we’re looking for a compelling video story and a “wow” factor demo that shows real-world impact. Join the challenge and help build a better future.Get started with Gemma 3n todayReady to explore the potential of Gemma 3n today? Here’s how:Experiment directly: Use Google AI Studio to try Gemma 3n in just a couple of clicks. Gemma models can also be deployed directly to Cloud Run from AI Studio.Learn & integrate: Dive into our comprehensive documentation to quickly integrate Gemma into your projects or start with our inference and fine-tuning guides.

Building on this incredible momentum, we’re excited to announce the full release of Gemma 3n. While last month’s preview offered a glimpse, today unlocks the full power of this mobile-first architecture. Gemma 3n is designed for the developer community that helped shape Gemma. It’s supported by your favorite tools including Hugging Face Transformers, llama.cpp, Google AI Edge, Ollama, MLX, and many others, enabling you to fine-tune and deploy for your specific on-device applications with ease. This post is the developer deep dive: we’ll explore some of the innovations behind Gemma 3n, share new benchmark results, and show you how to start building today.

What’s new in Gemma 3n?

Gemma 3n represents a major advancement for on-device AI, bringing powerful multimodal capabilities to edge devices with performance previously only seen in last year’s cloud-based frontier models.

Achieving this leap in on-device performance required rethinking the model from the ground up. The foundation is Gemma 3n’s unique mobile-first architecture, and it all starts with MatFormer.

MatFormer: One model, many sizes

At the core of Gemma 3n is the MatFormer (🪆Matryoshka Transformer) architecture, a novel nested transformer built for elastic inference. Think of it like Matryoshka dolls: a larger model contains smaller, fully functional versions of itself. This approach extends the concept of Matryoshka Representation Learning from just embeddings to all transformer components.

During the MatFormer training of the 4B effective parameter (E4B) model, a 2B effective parameter (E2B) sub-model is simultaneously optimized within it, as shown in the figure above. This provides developers two powerful capabilities and use cases today:

1: Pre-extracted models: You can directly download and use either the main E4B model for the highest capabilities, or the standalone E2B sub-model which we have already extracted for you, offering up to 2x faster inference.

2: Custom sizes with Mix-n-Match: For more granular control tailored to specific hardware constraints, you can create a spectrum of custom-sized models between E2B and E4B using a method we call Mix-n-Match. This technique allows you to precisely slice the E4B model’s parameters, primarily by adjusting the feed forward network hidden dimension per layer (from 8192 to 16384) and selectively skipping some layers. We are releasing the MatFormer Lab, a tool that shows how to retrieve these optimal models, which were identified by evaluating various settings on benchmarks like MMLU.

MMLU scores for the pre-trained Gemma 3n checkpoints at different model sizes (using Mix-n-Match)

Looking ahead, the MatFormer architecture also paves the way for elastic execution. While not part of today’s launched implementations, this capability allows a single deployed E4B model to dynamically switch between E4B and E2B inference paths on the fly, enabling real-time optimization of performance and memory usage based on the current task and device load.

Per-Layer Embeddings (PLE): Unlocking more memory efficiency

Gemma 3n models incorporate Per-Layer Embeddings (PLE). This innovation is tailored for on-device deployment as it dramatically improves model quality without increasing the high-speed memory footprint required on your device’s accelerator (GPU/TPU).

While the Gemma 3n E2B and E4B models have a total parameter count of 5B and 8B respectively, PLE allows a significant portion of these parameters (the embeddings associated with each layer) to be loaded and computed efficiently on the CPU. This means only the core transformer weights (approximately 2B for E2B and 4B for E4B) need to sit in the typically more constrained accelerator memory (VRAM).

With Per-Layer Embeddings, you can use Gemma 3n E2B while only having ~2B parameters loaded in your accelerator.

Processing long inputs, such as the sequences derived from audio and video streams, is essential for many advanced on-device multimodal applications. Gemma 3n introduces KV Cache Sharing, a feature designed to significantly accelerate time-to-first-token for streaming response applications.

KV Cache Sharing optimizes how the model handles the initial input processing stage (often called the “prefill” phase). The keys and values of the middle layer from local and global attention are directly shared with all the top layers, delivering a notable 2x improvement on prefill performance compared to Gemma 3 4B. This means the model can ingest and understand lengthy prompt sequences much faster than before.

Audio understanding: Introducing speech to text and translation

Gemma 3n uses an advanced audio encoder based on the Universal Speech Model (USM). The encoder generates a token for every 160ms of audio (about 6 tokens per second), which are then integrated as input to the language model, providing a granular representation of the sound context.

This integrated audio capability unlocks key features for on-device development, including:

Automatic Speech Recognition (ASR): Enable high-quality speech-to-text transcription directly on the device.

Automatic Speech Translation (AST): Translate spoken language into text in another language.

We’ve observed particularly strong AST results for translation between English and Spanish, French, Italian, and Portuguese, offering great potential for developers targeting applications in these languages. For tasks like speech translation, leveraging Chain-of-Thought prompting can significantly enhance results. Here’s an example:

user
Transcribe the following speech segment in Spanish, then translate it into English: 

model

Plain text

At launch time, the Gemma 3n encoder is implemented to process audio clips up to 30 seconds. However, this is not a fundamental limitation. The underlying audio encoder is a streaming encoder, capable of processing arbitrarily long audios with additional long form audio training. Follow-up implementations will unlock low-latency, long streaming applications.

MobileNet-V5: New state-of-the-art vision encoder

Alongside its integrated audio capabilities, Gemma 3n features a new, highly efficient vision encoder, MobileNet-V5-300M, delivering state-of-the-art performance for multimodal tasks on edge devices.

Designed for flexibility and power on constrained hardware, MobileNet-V5 gives developers:

Multiple input resolutions: Natively supports resolutions of 256×256, 512×512, and 768×768 pixels, allowing you to balance performance and detail for your specific applications.

Broad visual understanding: Co-trained on extensive multimodal datasets, it excels at a wide range of image and video comprehension tasks.

High throughput: Processes up to 60 frames per second on a Google Pixel, enabling real-time, on-device video analysis and interactive experiences.

This level of performance is achieved with multiple architectural innovations, including:

An advanced foundation of MobileNet-V4 blocks (including Universal Inverted Bottlenecks and Mobile MQA).

A significantly scaled up architecture, featuring a hybrid, deep pyramid model that is 10x larger than the biggest MobileNet-V4 variant.

A novel Multi-Scale Fusion VLM adapter that enhances the quality of tokens for better accuracy and efficiency.

Benefiting from novel architectural designs and advanced distillation techniques, MobileNet-V5-300M substantially outperforms the baseline SoViT in Gemma 3 (trained with SigLip, no distillation). On a Google Pixel Edge TPU, it delivers a 13x speedup with quantization (6.5x without), requires 46% fewer parameters, and has a 4x smaller memory footprint, all while providing significantly higher accuracy on vision-language tasks

We’re excited to share more about the work behind this model. Look out for our upcoming MobileNet-V5 technical report, which will deep dive into the model architecture, data scaling strategies, and advanced distillation techniques.

Making Gemma 3n accessible from day one has been a priority. We’re proud to partner with many incredible open source developers to ensure broad support across popular tools and platforms, including contributions from teams behind AMD, Axolotl, Docker, Hugging Face, llama.cpp, LMStudio, MLX, NVIDIA, Ollama, RedHat, SGLang, Unsloth, and vLLM.

But this ecosystem is just the beginning. The true power of this technology is in what you will build with it. That’s why we’re launching the Gemma 3n Impact Challenge. Your mission: use Gemma 3n’s unique on-device, offline, and multimodal capabilities to build a product for a better world. With $150,000 in prizes, we’re looking for a compelling video story and a “wow” factor demo that shows real-world impact. Join the challenge and help build a better future.

Get started with Gemma 3n today

Ready to explore the potential of Gemma 3n today? Here’s how:

Experiment directly: Use Google AI Studio to try Gemma 3n in just a couple of clicks. Gemma models can also be deployed directly to Cloud Run from AI Studio.

Learn & integrate: Dive into our comprehensive documentation to quickly integrate Gemma into your projects or start with our inference and fine-tuning guides.

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Helios marks AMD’s biggest AI infrastructure push yet

The architecture behind Helios The launch of Helios marks AMD’s latest attempt to strengthen its position in a market where Nvidia continues to dominate AI infrastructure. Unlike previous AMD AI offerings centred on individual accelerators, Helios is designed as a complete rack-scale system integrating compute, networking and software. According to

Sheetz replaces VMware at more than 830 stores

The two companies have a relationship dating back to 2020, when Sheetz first deployed StorMagic’s SvSAN software as the hyperconverged storage layer with VMware across hundreds of store locations to virtualize critical in-store application. The setup supported mission-critical applications such as payment processing, loyalty programs, kitchen management and store operations.

Glenfarne Group secures $500 million for Texas LNG development

Glenfarne Group LLC has secured fresh capital to continue development and early construction works for the proposed Texas LNG plant to be constructed on a 625-acre site in the Port of Brownsville, Tex. HPS Investment Partners, a part of BlackRock, agreed to the $500-million investment, which serves as one of

AI workloads shake up observability market

There are 19 vendors that made the cut for Gartner’s new report. Its Leaders quadrant includes (alphabetically) Chronosphere, Coralogix, Datadog, Dynatrace, Elastic, Grafana Labs, IBM, and New Relic. The Challengers are Alibaba Cloud, Amazon Web Services, LogicMonitor, Microsoft, and Splunk. The two Visionaries are BMC Helix and Honeycomb. Those dubbed

S&P Global: Hormuz vessel transits fall amid heightened security risks

Vessel traffic through the Strait of Hormuz remained subdued July 10-12 as heightened regional security risks continued to weigh on movements through the strategic waterway, according to S&P Global MINT and S&P Global Commodities at Sea data. A total of 73 vessels transited the strait during the 3-day period, averaging fewer than 25 crossings/day. Transits fell to 11 on July 12, the lowest since June 14, after Iran declared the strait closed amid what the Persian Gulf Strait Authority described as “illegal movements” of US military forces in the region. No inbound crossings were recorded July 12, the first such occurrence since June 12. Six of the day’s 11 transits were assessed as compliant vessels. Total crossings were 32 on July 10 and 30 on July 11. The Joint Maritime Information Center (JMIC) said July 12 that the regional threat level remained severe. Despite Iran’s closure declaration, JMIC said the southern route remained available and had been expanded for two-way vessel traffic. Energy carriers—including oil, chemical, LPG, and LNG tankers—accounted for about 48% of transits July 10-12. About two-thirds of energy-carrier crossings involved compliant vessels, although only 10 compliant energy carriers entered the Persian Gulf, mostly without visible automatic identification system (AIS) signals. Inbound tanker capacity also softened. An average 6.5 million b/d of new oil and LPG tanker capacity entered the Gulf through Hormuz July 1-12, with VLCCs and Suezmaxes accounting for nearly 80%. Average inbound capacity fell to 6 million b/d July 10-12 from 8.5 million b/d in the first week of July. All compliant outbound energy carriers transiting Hormuz during the 3-day period did so without visible AIS signals, including ADNOC-operated LNG carrier AL HAMRA and several VLCC and product tankers. Iran-linked and US-sanctioned vessels accounted for nearly 60% of all crossings during the period.

Beyond AI Pilots: Scaling AI-Enabled Decision Making in Energy

Date: Thursday, August 6, 2026Time: 11:00 AM (GMT-04:00) Eastern Time – New YorkDuration: 60 minutes Already registered? Click here to log in now. Artificial Intelligence is rapidly becoming a strategic priority across industrial organizations, yet many companies continue to struggle with fragmented data, disconnected workflows, and AI initiatives that never move beyond pilot projects. The challenge is not access to AI—it is creating the business context, governance, and lifecycle intelligence needed to transform AI insights into measurable operational outcomes. Join Siemens Digital Industries Software to learn how Intelligence Center X, part of the Siemens Xcelerator portfolio, helps organizations connect enterprise data, workflows, and AI capabilities into a single governed environment where people and AI work together to drive faster, more informed decisions. In this session, we’ll explore how organizations can: • Move beyond isolated AI experiments to enterprise-scale deployment • Connect engineering, manufacturing, operations, supply chain, and service data into a unified intelligence framework • Enable AI agents to operate within governed, human-in-the-loop business processes • Improve operational performance through AI-assisted decision-making • Accelerate issue resolution, reduce manual effort, and increase organizational agility Attendees will also learn how Intelligence Center X combines lifecycle intelligence, industrial data models, AI orchestration, and low-code application development to create production-ready AI solutions that deliver measurable business value. Real-world examples will demonstrate how organizations have achieved significant improvements, including reductions in manual effort, faster issue resolution, improved data quality, and enhanced decision-making capabilities. Whether you are responsible for digital transformation, operations, manufacturing, engineering, or executive strategy, this webinar will provide practical insight into building a scalable foundation for industrial AI and creating a future where people and AI work together to drive business outcomes.

TotalEnergies lets drilling, completions contract for Suriname deepwater oil project

TotalEnergies has let contracts to Halliburton for work on the GranMorgu deepwater oil development project offshore Suriname. The workscope includes drilling and completions services for a long-term program that includes applying integrated digital workflows, real time data, and remote operations control for drilling and completions. As part of the project scope, Halliburton worked with local suppliers to upgrade its liquid mud and cement plant and supported construction of Suriname’s first completions and drilling workshop, featuring advanced maintenance and repair capabilities, the service provider said in a release July 13. The aim of the GranMorgu project is to develop resources on Block 58, which lies about 150 km off the Surinamese coast. Specifically, Sapakara and Krabdagu fields, which contain estimated recoverable reserves of nearly 760 million bbl, TotalEnergies noted on its website. The project’s floating production, storage, and offloading unit (FPSO), with a capacity of 220,000 b/d, is based on tested design principles of units in nearby Guyana and designed for potential future tie-in of satellite fields. Production start-up is expected in 2028. TotalEnergies is operator of the project with 40% interest. Partners are APA Corp. (40%) and state-owned Staatsolie Maatschappij Suriname NV (20%).

Aramco lets stimulation, completion services contract for unconventional gas development

Saudi Aramco has awarded Halliburton a multi-year contract to provide stimulation and completion services for the company’s unconventional gas development program in Saudi Arabia. Halliburton said July 15 that the award is part of a broader multibillion-dollar contract framework supporting the Kingdom’s unconventional resource expansion. Under the agreement, Halliburton will deploy intelligent fracturing automation technologies designed to optimize treatment performance in real time and support execution across multiwell development campaigns. The company said the technologies will enable greater digital integration across field operations. Development of the Jafurah unconventional gas field, the Middle East’s largest liquids-rich shale gas play, is under way. In support of the program, Halliburton plans to expand local manufacturing capacity, strengthen its supply chain network, and increase workforce development initiatives within the Kingdom as activity levels continue to grow. “Beginning in the third quarter of 2026, Halliburton will deploy the Kingdom’s first fully integrated intelligent fracturing platform through OCTIV® Auto Frac and Sensori™ fracturing monitoring services to contribute to asset value for one of the world’s largest unconventional fields,” said Rami Yassine, senior vice-president, Eastern Hemisphere, Halliburton. Jafurah background Jafurah is a key component of Aramco’s gas expansion strategy intended to help meet rising demand for natural gas in power generation and industry. In February 2026, the operator said it seeks to expand sales gas production capacity by about 80% by 2030 compared with 2021 production levels. At the time, Aramco said unconventional shale gas output from Jafurah began in December 2025. The field covers about 17,000 sq km and is estimated to contain 229 tcf of raw gas and 75 billion stb of condensate. Aramco expects the development to produce 2 bcfd of sales gas, 420 MMscfd of ethane, and about 630,000 b/d of high-value liquids by 2030.

Digitalization paying off for Rompetrol’s Petromidia refinery

Rompetrol Rafinare SA—jointly owned by Kazakhstan’s state-owned JSC NC KazMunayGas (KMG) subsidiary KMG International NV (54.63%) and Romania’s Ministry of Economy, Energy & Business Environment (44.7%)—is using proprietary operations management software from Emerson Electric Co. to improve alarm performance its more than 5-million tonne/year Petromidia refinery in Năvodari, Romania, on the Black Sea. To date, implementation of Emerson’s DeltaV AgileOps operations management software has helped reduce distributed control system (DCS) alarm volumes at the Petromidia refinery by more than 95%, the service provider said on July 14. Emerson said the project improved alarm performance, increased operator effectiveness, and brought alarm rates within the Engineering Equipment and Materials Users Association (EEMUA) 191 guideline recommendations. Before implementation of DeltaV AgileOps, alarm behavior at the refinery—Romania’s largest—expanded beyond recommended best practices, including high alarm volumes during plant disturbances, nuisance-chattering alarms, and alarms that remained active during normal operation. To address those issues, Rompetrol Rafinare worked with KMG International’s engineering and maintenance services provider SC Rominserv SRL to improve alarm quality and reduce nuisance alarms across the refinery. Use of DeltaV AgileOps—which pulls alarm and event data directly from the DeltaV DCS running the plant—provided continuous visibility into alarm performance, including average and peak alarm rates, recurring alarm sequences, and time spent outside recommended operating thresholds, Emerson said. Following implementation, engineering teams at the refinery used performance dashboards and historical trending to identify high-frequency alarms, stale alarms, and nuisance “bad actor” alarms responsible for disproportionate alarm activity. The teams evaluated alarm behavior during steady-state operation, startup conditions, and process disturbances, then assessed proposed changes to alarm limits, priorities, and suppression strategies against plant data. Emerson said the project reduced alarm generation to fewer than 50,000 alarms/month from more than 2 million alarms/month during normal operation. Emerson—which linked the outcome to EEMUA 191 guidance that

EIA: US crude inventories down 1.7 million bbl

US crude oil inventories for the week ended July 10, excluding the Strategic Petroleum Reserve, decreased by 1.7 million bbl from the previous week, according to data from the US Energy Information Administration (EIA). At 409.7 million bbl, US crude oil inventories are about 6% below the 5-year average for this time of year, the EIA report indicated. EIA said total motor gasoline inventories decreased by 1.5 million bbl from last week and are 8% below the 5-year average for this time of year. Finished gasoline inventories and blending components inventories both decreased last week. Distillate fuel inventories increased by 4.6 million bbl last week and are about 11% below the 5-year average for this time of year. Propane-propylene inventories increased by 3 million bbl from last week and are 28% above the 5-year average for this time of year, EIA said. US crude oil refinery inputs averaged 17.1 million b/d for the week ended July 10, which was 99,000 b/d more than the previous week’s average. Refineries operated at 96.2% of capacity. Gasoline production decreased, averaging 9.6 million b/d. Distillate fuel production increased, averaging 5.3 million b/d. US crude oil imports averaged 5.7 million b/d, up 60,000 b/d from the previous week. Over the last 4 weeks, crude oil imports averaged about 5.5 million b/d, 12.2% less than the same 4-week period last year. Total motor gasoline imports averaged 354,000 b/d. Distillate fuel imports averaged 93,000 b/d.

Time to Power: Sage Geosystems CEO Cindy Taff on Geothermal’s AI Infrastructure Moment

Three years ago, the data center industry’s energy conversation was largely framed around emissions. Hyperscale operators were setting carbon-free energy targets, signing renewable power agreements, and aligning their expanding infrastructure portfolios with corporate sustainability commitments. The arrival of generative AI has not eliminated those priorities. But it has reordered them. “Three years ago, data center energy, they were really focused on low emissions, no emissions,” said Cindy Taff, CEO of Sage Geosystems. “Now the primary challenge is just enough energy.” Speaking on the Data Center Frontier Show podcast, Taff described an energy market being reshaped by the speed and physical scale of AI infrastructure development. After decades of relatively flat U.S. electricity demand, AI has introduced a new class of concentrated, rapidly arriving industrial load. The result is a shift away from thinking only about how much generating capacity exists in aggregate and toward a harder question: Can usable power be delivered at a specific site, on a predictable schedule, in the quantities an AI campus requires? For hyperscalers, neocloud providers, data center developers, utilities, and energy companies, that distinction is becoming central to project execution. “I think time to power is the most precious metric right now versus cost or total capacity,” Taff said. Capacity on Paper Is Not Power at the Site Announcements of new generation can create the appearance of an energy system capable of meeting rising data center demand. But a megawatt located far from a planned campus, trapped behind a transmission constraint, or unavailable until the next decade has limited value to a developer trying to energize an AI facility within several years. “Aggregate capacity is not going to solve the problem if the power really isn’t where and when you need it,” Taff said. Data centers are large physical facilities tied to specific parcels,

Tech Explainer: Data Center Cooling – Air, Evaporative, Liquid, and Hybrid Approaches

Data Center Cooling Glossary The following definitions reflect common terminology used in Department of Energy guidance, ASHRAE TC 9.9 materials, Berkeley Lab resources and Green Grid efficiency metrics. Adiabatic Cooling — A cooling process that uses water evaporation to lower the temperature of air before it reaches a heat exchanger or cooling coil. It can reduce compressor demand but consumes water when evaporative assistance is active. Air-Cooled Data Center — A facility in which heat is removed from IT equipment primarily by moving conditioned air through servers, even if that heat is later transferred to water or refrigerant elsewhere in the cooling system. Air Handler — Equipment that moves, filters and conditions air before delivering it to a data hall or other controlled space. Air-Side Economizer — A system that uses suitable outdoor air, either directly or mixed with return air, to reduce or avoid compressor-based refrigeration. Airflow Management — The practice of delivering conditioned air where it is needed while preventing hot exhaust air from recirculating into server inlets. Approach Temperature — The temperature difference between the two fluids leaving a heat exchanger at their closest thermal point. In a cooling tower, it commonly refers to the difference between leaving-water temperature and entering-air wet-bulb temperature. A smaller approach generally indicates more effective heat transfer. ASHRAE TC 9.9 — The ASHRAE technical committee focused on mission-critical facilities, data centers, technology spaces and electronic equipment. It is a major source of environmental and thermal guidance for data center operators and equipment manufacturers. Blanking Panel — A panel installed in unused rack spaces to prevent hot exhaust air from recirculating to server intakes. British Thermal Unit, or BTU — A unit of heat energy commonly used to express the heating or cooling capacity of equipment. Cabinet — An enclosure, also commonly called

The AI Infrastructure Split Screen: Capital Rush Meets Community Resistance

It would be difficult to construct a more revealing snapshot of the AI infrastructure market than the one delivered in mid-July. In the same news cycle, Csquare completed a billion-dollar initial public offering, Switch was linked to a potential $10 billion IPO, and Databricks reached a reported valuation of $188 billion. At the project level, developers advanced or disclosed campuses measured not in tens or hundreds of megawatts, but in gigawatts—from Meta’s expanding Louisiana complex and Google’s reported Wyoming plans to new Crusoe, QTS, MARA and Tract developments. Yet the same week brought a state-level permitting pause in New York, a decisive project rejection in Palm Beach County, planned protests across more than 20 states, and fresh disputes over parkland, water availability and local control. This is the data center and AI landscape in 2026: capital is abundant but increasingly discriminating; power is more valuable than the underlying real estate; and community consent has become nearly as important as interconnection capacity. Public Markets Put Different Prices on the AI Stack The capital-market headlines illustrated how differently investors are valuing the various layers of AI infrastructure. Csquare priced 50 million shares at $21, raising approximately $1.05 billion and establishing an equity valuation of roughly $3.2 billion. The offering was substantial, but it priced below the proposed $23-to-$27 range, and the shares finished their first trading day slightly below the offer price. Brookfield retained approximately 67% of the company’s voting power following the transaction. That reception contrasts sharply with the valuation being discussed for Switch. The DigitalBridge-backed operator has reportedly engaged Goldman Sachs and JPMorgan for a potential IPO that could raise as much as $10 billion and value Switch near $80 billion, including debt. The transaction remains prospective, but the figure is striking when compared with the $11 billion take-private agreement

New York State just hit pause on the AI data center boom

The moratorium could result in some “border-hopping,” with enterprises hosting local servers in adjacent states like Pennsylvania, Connecticut, or New Jersey, but that’s not likely to be widespread, Kimball noted. The realistic regional impact will be “more of a slow squeeze rather than a shock,” he said. This could result in tighter colocation availability and firmer pricing in the New York Metropolitan area over the next few years. Cloud providers may also steer new AI capacity to regions like Georgia, Ohio, Texas, and Utah, where power and permitting are more predictable. An inflection point, but more trickle-down than direct impact Indeed, noted Jeremy Roberts, senior director for research and content at Info-Tech Research Group, the moratorium is an “inflection point” and a “way to placate an increasingly angry public,”.

TeraWulf’s $19B Anthropic Lease Puts Its Brownfield AI Strategy to the Test

He added that the company’s strategy is centered on owning and operating critical infrastructure, maintaining direct relationships with customers and controlling the long-term evolution of its campuses. This Model Differs Significantly from the Previous Abernathy JV TeraWulf and Fluidstack created the Abernathy venture in 2025 to develop a 168-MW critical IT load campus on approximately 120 acres near Abernathy, Texas. The project’s total utility requirement has been described as approximately 240 MW. Fluidstack committed to a 25-year lease at the campus, with Google providing approximately $1.3 billion of credit support for Fluidstack’s obligations. TeraWulf acquired a 50.1% interest in the joint venture through an investment of approximately $450 million. The project subsequently issued $1.3 billion in senior secured notes to support construction and related expenses. The Abernathy agreements were expected to produce approximately $9.5 billion in contracted revenue for the joint venture over the initial 25-year term. Construction has been advancing toward delivery during the second half of 2026. Following the sale, Fluidstack and the other purchasers will control the project. TeraWulf agreed to sell its Abernathy interest for approximately $530 million, compared with its $450 million investment in the joint venture. The consideration is scheduled to be paid in three installments through April 2027, with the proceeds expected to support investment in infrastructure opportunities that TeraWulf intends to own and operate directly. The decision does not necessarily indicate that TeraWulf has become less interested in partnerships with Fluidstack. Fluidstack remains an important tenant at TeraWulf’s Lake Mariner campus in New York, and the companies have built a substantial pipeline of AI infrastructure together. In infrastructure terms, TeraWulf is acting as both developer and capital allocator. It originated the Abernathy project, helped secure the customer and financing structure, advanced construction and is now monetizing its interest before the campus begins

Comparing Space-Driven Data Center Strategies: Modular Satellites vs. Integrated Rocket Nodes

In addition to developing radiation-tolerant computing, optical communications, deployable solar arrays and orbital thermal-management systems, Cowboy must successfully design, manufacture, test and license a new rocket. Its launch vehicle would require authorization from the Federal Aviation Administration in addition to the approvals needed for the satellite constellation. Cowboy nevertheless enters the race with considerably more capital than Orbital. The company announced a $275 million Series B round in May at a reported $2 billion valuation. Founded in 2024 by Robinhood co-founder Baiju Bhatt, with a focus on space-based solar power before expanding into orbital computing and launch systems. One Hundred Kilowatts Versus One Megawatt The clearest distinction between the two proposals is the capacity assigned to each node. Orbital’s production design calls for approximately 100 kilowatts of computing power per satellite. Cowboy is targeting megawatt-class spacecraft, potentially giving each Stampede node approximately 10 times the power capacity of an Orbital satellite. At their stated maximum scales, Orbital’s 100,000 satellites would provide approximately 10 gigawatts. If Cowboy ultimately achieved one megawatt across all 20,000 Stampede spacecraft, its theoretical aggregate capacity would approach 20 gigawatts. Those figures should be treated as design objectives, not capacity forecasts. Neither company has demonstrated even one operational node at its proposed production power level. Orbital’s smaller satellites may be easier to test and deploy incrementally. The company can begin with a single hosted GPU, progress to a purpose-built prototype and expand as launch economics and customer demand permit. Cowboy’s larger nodes could provide more useful computing capacity with fewer satellites and potentially fewer launches. Combining the rocket stage and data center would also reduce the amount of structural mass that does not directly support power generation or computing. The tradeoff is concentration risk. The failure of a megawatt Cowboy spacecraft would remove considerably more capacity than

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE