Meta’s answer to DeepSeek is here: Llama 4 launches with long context Scout and Maverick models, and 2T parameter Behemoth on the way!

Stay Ahead, Stay ONMINE

Meta’s answer to DeepSeek is here: Llama 4 launches with long context Scout and Maverick models, and 2T parameter Behemoth on the way!

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The entire AI landscape shifted back in January 2025 after a then little-known Chinese AI startup DeepSeek (a subsidiary of the Hong Kong-based quantitative analysis firm High-Flyer Capital Management) launched its powerful open source language reasoning […]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

The entire AI landscape shifted back in January 2025 after a then little-known Chinese AI startup DeepSeek (a subsidiary of the Hong Kong-based quantitative analysis firm High-Flyer Capital Management) launched its powerful open source language reasoning model DeepSeek R1 publicly to the world, besting U.S. giants such as Meta.

As DeepSeek usage spread rapidly among researchers and enterprises, Meta was reportedly sent into panic mode upon learning that this new R1 model had been trained for a fraction of the cost of many other leading models yet outclassed them, reportedly for as little as several million dollars — what it pays some of its own AI team leaders.

Meta’s whole generative AI strategy had until that point been predicated on releasing best-in-class open source models under its brand name “Llama” for researchers and companies to build upon freely (at least, if they had fewer than 700 million monthly users, at which point they are supposed to contact Meta for special paid licensing terms). Yet DeepSeek R1’s astonishingly good performance on a far smaller budget had allegedly shaken the company leadership and forced some kind of reckoning, with the last version of Llama, 3.3, having been released just a month prior in December 2024 yet already looking outdated.

Now we know the fruits of that effort: today, Meta founder and CEO Mark Zuckerberg took to his Instagram account to announced a new Llama 4 series of models, with two of them — the 400-billion parameter Llama 4 Maverick and 109-billion parameter Llama 4 Scout — available today for developers to download and begin using or fine-tuning now on llama.com and AI code sharing community Hugging Face.

A massive 2-trillion parameter Llama 4 Behemoth is also being previewed today, though Meta’s blog post on the releases said it was still being trained, and gave no indication of when it might be released. (Recall parameters refer to the settings that govern the model’s behavior and that generally more mean a more powerful and complex all around model.)

One headline feature of these models is that they are all multimodal — trained on, and therefore, capable of receiving and generating text, video, and imagery (hough audio was not mentioned).

Another is that they have incredibly long context windows — 1 million tokens for Llama 4 Maverick and 10 million for Llama 4 Scout — which is equivalent to about 1,500 and 15,000 pages of text, respectively, all of which the model can handle in a single input/output interaction. That means a user could theoretically upload or paste up to 7,500 pages-worth-of text and receive that much in return from Llama 4 Scout, which would be handy for information-dense fields such as medicine, science, engineering, mathematics, literature etc.

Here’s what else we’ve learned about this release so far:

All-in on mixture-of-experts

All three models use the “mixture-of-experts (MoE)” architecture approach popularized in earlier model releases from OpenAI and Mistral, which essentially combines multiple smaller models specialized (“experts”) in different tasks, subjects and media formats into a unified whole, larger model. Each Llama 4 release is said to be therefore a mixture of 128 different experts, and more efficient to run because only the expert needed for a particular task, plus a “shared” expert, handles each token, instead of the entire model having to run for each one.

As the Llama 4 blog post notes:

As a result, while all parameters are stored in memory, only a subset of the total parameters are activated while serving these models. This improves inference efficiency by lowering model serving costs and latency—Llama 4 Maverick can be run on a single [Nvidia] H100 DGX host for easy deployment, or with distributed inference for maximum efficiency.

Both Scout and Maverick are available to the public for self-hosting, while no hosted API or pricing tiers have been announced for official Meta infrastructure. Instead, Meta focuses on distribution through open download and integration with Meta AI in WhatsApp, Messenger, Instagram, and web.

Meta estimates the inference cost for Llama 4 Maverick at $0.19 to $0.49 per 1 million tokens (using a 3:1 blend of input and output). This makes it substantially cheaper than proprietary models like GPT-4o, which is estimated to cost $4.38 per million tokens, based on community benchmarks.

All three Llama 4 models—especially Maverick and Behemoth—are explicitly designed for reasoning, coding, and step-by-step problem solving — though they don’t appear to exhibit the chains-of-thought of dedicated reasoning models such as the OpenAI “o” series, nor DeepSeek R1.

Instead, they seem designed to compete more directly with “classical,” non-reasoning LLMs and multimodal models such as OpenAI’s GPT-4o and DeepSeek’s V3 — with the exception of Llama 4 Behemoth, which does appear to threaten DeepSeek R1 (more on this below!)

In addition, for Llama 4, Meta built custom post-training pipelines focused on enhancing reasoning, such as:

Removing over 50% of “easy” prompts during supervised fine-tuning.
Adopting a continuous reinforcement learning loop with progressively harder prompts.
Using pass@k evaluation and curriculum sampling to strengthen performance in math, logic, and coding.
Implementing MetaP, a new technique that lets engineers tune hyperparameters (like per-layer learning rates) on models and apply them to other model sizes and types of tokens while preserving the intended model behavior.

MetaP is of particular interest as it could be used going forward to set hyperparameters on on model and then get many other types of models out of it, increasing training efficiency.

As my VentureBeat colleague and LLM expert Ben Dickson opined ont the new MetaP technique: “This can save a lot of time and money. It means that they run experiments on the smaller models instead of doing them on the large-scale ones.”

This is especially critical when training models as large as Behemoth, which uses 32K GPUs and FP8 precision, achieving 390 TFLOPs/GPU over more than 30 trillion tokens—more than double the Llama 3 training data.

In other words: the researchers can tell the model broadly how they want it to act, and apply this to larger and smaller version of the model, and across different forms of media.

A powerful – but not yet the most powerful — model family

In his announcement video on Instagram (a Meta subsidiary, naturally), Meta CEO Mark Zuckerberg said that the company’s “goal is to build the world’s leading AI, open source it, and make it universally accessible so that everyone in the world benefits…I’ve said for a while that I think open source AI is going to become the leading models, and with Llama 4, that is starting to happen.”

It’s a clearly carefully worded statement, as is Meta’s blog post calling Llama 4 Scout, “the best multimodal model in the world in its class and is more powerful than all previous generation Llama models,” (emphasis added by me).

In other words, these are very powerful models, near the top of the heap compared to others in their parameter-size class, but not necessarily setting new performance records. Nonetheless, Meta was keen to trumpet the models its new Llama 4 family beats, among them:

Llama 4 Behemoth

Outperforms GPT-4.5, Gemini 2.0 Pro, and Claude Sonnet 3.7 on:
- MATH-500 (95.0)
- GPQA Diamond (73.7)
- MMLU Pro (82.2)

Llama 4 Maverick

Beats GPT-4o and Gemini 2.0 Flash on most multimodal reasoning benchmarks:
- ChartQA, DocVQA, MathVista, MMMU
Competitive with DeepSeek v3.1 (45.8B params) while using less than half the active parameters (17B)
Benchmark scores:
- ChartQA: 90.0 (vs. GPT-4o’s 85.7)
- DocVQA: 94.4 (vs. 92.8)
- MMLU Pro: 80.5
Cost-effective: $0.19–$0.49 per 1M tokens

Llama 4 Scout

Matches or outperforms models like Mistral 3.1, Gemini 2.0 Flash-Lite, and Gemma 3 on:
- DocVQA: 94.4
- MMLU Pro: 74.3
- MathVista: 70.7
Unmatched 10M token context length—ideal for long documents, codebases, or multi-turn analysis
Designed for efficient deployment on a single H100 GPU

But after all that, how does Llama 4 stack up to DeepSeek?

But of course, there are a whole other class of reasoning-heavy models such as DeepSeek R1, OpenAI’s “o” series (like GPT-4o), Gemini 2.0, and Claude Sonnet.

Using the highest-parameter model benchmarked—Llama 4 Behemoth—and comparing it to the intial DeepSeek R1 release chart for R1-32B and OpenAI o1 models, here’s how Llama 4 Behemoth stacks up:

Benchmark	Llama 4 Behemoth	DeepSeek R1	OpenAI o1-1217
MATH-500	95.0	97.3	96.4
GPQA Diamond	73.7	71.5	75.7
MMLU	82.2	90.8	91.8

What can we conclude?

MATH-500: Llama 4 Behemoth is slightly behind DeepSeek R1 and OpenAI o1.
GPQA Diamond: Behemoth is ahead of DeepSeek R1, but behind OpenAI o1.
MMLU: Behemoth trails both, but still outperforms Gemini 2.0 Pro and GPT-4.5.

Takeaway: While DeepSeek R1 and OpenAI o1 edge out Behemoth on a couple metrics, Llama 4 Behemoth remains highly competitive and performs at or near the top of the reasoning leaderboard in its class.

Safety and less political ‘bias’

Meta also emphasized model alignment and safety by introducing tools like Llama Guard, Prompt Guard, and CyberSecEval to help developers detect unsafe input/output or adversarial prompts, and implementing Generative Offensive Agent Testing (GOAT) for automated red-teaming.

The company also claims Llama 4 shows substantial improvement on “political bias” and says “specifically, [leading LLMs] historically have leaned left when it comes to debated political and social topics,” that that Llama 4 does better at courting the right wing…in keeping with Zuckerberg’s embrace of Republican U.S. president Donald J. Trump and his party following the 2024 election.

Where Llama 4 stands so far

Meta’s Llama 4 models bring together efficiency, openness, and high-end performance across multimodal and reasoning tasks.

With Scout and Maverick now publicly available and Behemoth previewed as a state-of-the-art teacher model, the Llama ecosystem is positioned to offer a competitive open alternative to top-tier proprietary models from OpenAI, Anthropic, DeepSeek, and Google.

Whether you’re building enterprise-scale assistants, AI research pipelines, or long-context analytical tools, Llama 4 offers flexible, high-performance options with a clear orientation toward reasoning-first design.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Surge in threat actors scanning Juniper, Cisco, and Palo Alto Networks devices

A surge in internet probes targeting devices from Juniper Networks, Cisco Systems, and Palo Alto Networks should put their admins on alert, say security experts. A threat actor is probing the internet using default credentials for a Juniper Networks router, prompting a cybersecurity expert to warn network admins to change

Ivanti warns customers of new critical flaw exploited in the wild

“The vulnerability is a buffer overflow with a limited character space, and therefore it was initially believed to be a low-risk denial-of-service vulnerability,” incident responders from Google-owned Mandiant wrote in a report on the flaw. “We assess it is likely the threat actor studied the patch for the vulnerability in

A look back at Microsoft’s IPO

Speaking of good fortune, Fortune magazine was granted inside access to Gates, his executive and legal teams, and their Wall Street partners in the months leading up to the IPO. That arrangement resulted in a terrific fly-on-the-wall story published four months later. A few highlights gleaned from that story and

ServiceNow to acquire Logik.ai to boost CRM portfolio

“With CPQ more seamlessly embedded into the sales and order management capabilities, sellers can increase productivity by exponentially reducing time towards building sales quotes and recording opportunities in the system. But also, as the system learns, it can also recommend the right products and services to add to a particular

Crown Estate launches final phase of Celtic Sea floating wind leasing round

Plans to build up to 4.5 GW of floating wind capacity in the Celtic Sea have moved a step closer as The Crown Estate enters the final stages of its latest leasing round. The Crown Estate (TCE) also announced that bidding wind developers have identified seven ports in Wales and south west England for turbine assembly and deployment. In total, the three Celtic Sea wind farms could support the creation of an estimated 5,000 jobs and provide a £1.4 billion economic boost to the region. TCE, a statutory corporation which manages the seabed around England, Wales and Northern Ireland, embarked on its fifth offshore wind leasing round in February 2024. The leasing round is expected to award seabed rights for three floating wind farms in the Celtic Sea. Altogether, the three projects will be capable of generating enough capacity to power more than four million average UK homes. After assessing bids submitted in the first stage of the tender, TCE said it will now invite successful shortlisted developers to participate in an auction process later in the Spring. © The Crown EstateA map showing three sites with the potential for 1.5 GW floating wind capacity in the Celtic Sea identified by The Crown Estate. TCE said it then expects the winning bidders to sign agreements covering the floating wind leases later in the Summer. Firms which have announced an interest in developing projects in the Celtic Sea include Germany’s RWE, Norway’s Equinor, and Blue Gem Wind, a joint venture between Ireland’s Simply Blue Energy and France’s TotalEnergies. Celtic Sea floating wind TCE managing director Gus Jaspert said the advent of floating wind offers a “generational opportunity for the UK to be a the forefront of an exciting new global industry”. “Developing this new technology in the Celtic Sea will open

Mentor Capital Increases Permian Royalty Stakes

Mentor Capital Inc. has expanded its stake in the West Texas Permian Basin, snapping up eight new royalty interest lots in an all-cash deal. The company said in a media release that the royalty streams it purchased pay out a portion of revenue from the oil and gas production “off the top”. The company has no obligation to pay the expenses of the underlying production. With the purchase, Mentor increases its overall ownership of assets in the sector of oil and gas, coal, and uranium by 27.5 percent on a cost basis. “The three major Permian Basin pooled oil and gas projects that Mentor currently participates in represent in total approximately 131 producing wells plus a number of development opportunities”, Mentor said. “This large combined oil and gas footprint is expected to have considerable life. “As is now common in Permian oil fields, some existing and possible wells are projected to utilize multi-leg horizontal and directional drilling with parallel lateral lengths reaching out 2 to 3 miles”. On a cost basis, the latest follow-on purchase increases Mentor’s portfolio of classic energy assets owned to 10.92 cents per Mentor common share, with 21,686,105 shares outstanding, the company said. The acquisition follows Mentor’s purchase of a 25.127 net royalty acre portion of a producing 71-well pooled project in the West Texas Permian Basin in another all-cash transaction. Mentor said at the time its purchased royalty stream was the equivalent of 12.5 percent “off the top” of oil and gas revenues for its acreage, with no responsibility to pay any expenses. To contact the author, email [email protected] WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed. MORE

Kenya Extends Oil Import Deal with Gulf Oil Giants through 2027

Kenya has renewed a contract to purchase fuel on credit from three state-owned Gulf firms by 24 months and renegotiated lower margins. Saudi Aramco, Emirates National Oil Co. and Abu Dhabi National Oil Co. will continue to supply gasoline, diesel, kerosene and jet fuel under a 180-day credit plan, Energy and Petroleum Regulatory Authority Director-General Daniel Kiptoo said in an interview in the capital, Nairobi. The two-year extension will kick in “toward the end of the year” once the East African nation completes imports of previously agreed shipments, Kiptoo said. Volume uptake was hampered by neighboring Uganda’s decision to directly source its own fuel products, he said. “The plan has helped stabilize the currency. It also gives us security of supply even in the event of supply shocks,” Kiptoo said. “The structure is working and even other countries are coming to Kenya to replicate it.” Freight and premium costs will drop 11 percent to $78 per metric ton of diesel, 7 percent to $84 for gasoline and 13 percent to $97 for jet fuel. Prices for the products are based on S&P Global Platts benchmark, Kiptoo said. The arrangement saves local oil marketing companies the hassle of sourcing dollars for imports, according to Kiptoo. It is the second time authorities are renewing the contract first drawn up in 2023 as part of a strategy to ease pressure on forex reserves and to support the shilling. Market Distortions The extension is a change of heart for Kenya, which pledged “to exit the oil import arrangement, as we are cognizant of the distortions it has created in the foreign exchange market,” according to a Treasury letter to the International Monetary Fund published in November. It also highlighted “the accompanying increase in rollover risk of the private sector financing facilities supporting it and remain committed to

DOE offers 16 locations for possible data center, energy infrastructure development

The U.S. Department of Energy has identified 16 federal locations for potential construction of data centers and associated energy resources. The agency on Thursday published a request for information for stakeholders, including grid operators, about the potential for projects that could be online in less than two years. “The global race for AI dominance is the next Manhattan project, and with President Trump’s leadership and the innovation of our National Labs, the United States can and will win,” Secretary of Energy Chris Wright said in a statement. DOE “is taking important steps to leverage our domestic resources to power the AI revolution.” Data centers today account for about 4.5% of U.S. electricity consumption, but could reach 12% by 2028, the Southwest Energy Efficiency Project noted in a recent report. The RFI aligns with plans Trump announced in January to accelerate power plant development for co-located artificial intelligence data centers using an energy emergency declaration. It is also similar to an executive order former President Biden signed in January, targeting development of AI data centers powered by clean energy. The RFI, however, does not specify clean energy will be used in powering data centers. Responses to the RFI are due within 30 days of its publication in the Federal Register. DOE “seeks to assess industry interest in developing, operating, and maintaining AI infrastructure on select DOE owned or managed lands, along with information on potential development approaches, technology solutions, operational models, and economic considerations,” according to the RFI. It also “seeks input from grid operators that serve DOE sites on opportunities and challenges associated with existing energy infrastructure and potential co-location of data centers with new energy generation.” The RFI seeks input on a range of subjects, including data center “power needs, timelines, and approaches to co-locating energy sources with data

SPP to rely on demand response to help bridge shrinking power supplies: CEO Nickell

It is unlikely that enough power supplies can be built in time to meet near-term rising electricity demand in the Southwest Power Pool’s footprint, according to Lanny Nickell, SPP president and CEO. As a result, SPP will need to turn to demand response programs to help bridge that supply-demand gap, Nickell said Thursday during a meeting held by WIRES, a trade group focused on transmission issues. SPP expects its excess capacity will fall to 5% in 2029, down from 24% in 2020, according to Nickell. “Excess generating capacity is dwindling, and it’s dwindling to a point where it’s becoming dangerous,” he said. A lot of generation has to be added quickly to meet a one-day in 10-year loss of load expectation, according to Nickell. “I don’t think it can be added that quickly,” he said. “So what does that mean? Means we’re going to have to rely a lot more on demand response to help us meet this challenge.” SPP is developing a “comprehensive” demand response policy that includes more effective DR options, Nickell said. SPP expects that its peak load could grow to 97 GW by 2035 from 56 GW last year, driven by data centers, home heating electrification and electric vehicles, according to Nickell. SPP operates the grid and wholesale power markets in 14 states from northern Texas to Montana. The grid operator’s interconnection queue has about 135 GW of potential capacity, including nearly 23 GW of gas-fired generation, according to Nickell. “That is by far the most gas generation we’ve ever seen in our generator interconnection queue, by far, and it’s going to be valuable, because it’s going to provide that dispatchability that we need to offset the solar and the wind,” Nickell said. “The storage will be very helpful, too.” Last year, wind farms provided about

WTI Sinks 14% in Two Days Amid Global Unrest

Oil tumbled to a four-year low, following a surprise output increase by OPEC+ and a rapidly escalating global trade war that’s also rattling commodities markets from metals to gas. Oil’s rout was triggered Thursday by US President Donald Trump deluge of tariffs, which threaten the global economy and energy consumption. Hours later, OPEC+ tripled a planned output hike for May, in what delegates called a deliberate effort to lower prices to punish members that were pumping above their quota. West Texas Intermediate futures have fallen about 14% in just two days — settling near $61 a barrel in a move similar to steep losses seen during the pandemic — while Brent also ended the day at the lowest since 2021. The declines were exacerbated on Friday by China’s retaliation against the US duties, including a 34% tariff on all imports from the US starting within a week. Other commodities also slumped as wider financial markets took a hit and fears mounted about weaker demand for raw materials. Copper slid as much as 7.7% to the lowest since January, while benchmark European natural gas futures at one point tumbled more than 10%. Glencore Plc shares plunged more than 9%, with fellow major miners BHP Group and Rio Tinto Group also sliding. Oil’s retreat represented a dramatic breakout from a price band of about $15 that has paralyzed trading and spurred bets on low volatility for much of the last six months. During that period, OPEC+ supply curbs were seen to put a floor under the market, while the group’s ample spare capacity acted as a ceiling. This week’s unexpected production increase raises questions about whether the alliance will continue to defend higher prices. The dual hit from OPEC+ and tariffs has prompted a rush by traders and Wall Street banks to

New Intel CEO Lip-Bu Tan begins to lay out technology roadmap

He said that in the past, Intel designed hardware, then partners had to figure out developing the software to make it work. “The world has changed. You have to flip that around. Going forward, we will start with the problem, what you’re trying to solve, and the workloads you need to handle enable. Then we work backward from that, that require embrace the software 2.0 mentality, which means that having a software-first design,” said Tan. Analysts in attendance liked what they heard, even if it was limited in specificity. “What was clear to me was Tan will be focused on eliminating distractions, investing in talent and making sure the company has a more compelling roadmap to compete in the AI data center race,” said Daniel Newman, CEO of The Futurum Group. He said there was a cautious optimism evident at the event as the certainty of its new leadership provided a boost for its partners and employees. “However, there are still more questions than answers, and that should be expected, given his recent arrival and clear philosophy about what needs to come next, which in many ways starkly contrasted what came before,” said Newman. Bob O’Donnell, president and chief analyst with TECHnalysis Research, said the strategy that Tan discussed at his keynote isn’t really much different than those described by his predecessor: build great products and a great foundry business. “That’s not necessarily a bad thing, though, because I believe they’re ultimately the right things for the company to pursue. The difference is that Lip-Bu seemed more willing to tackle the challenge of right-sizing Intel and mentioned cutting things that aren’t core to the business. The big unanswered question is, however, what does he consider those areas/products to be so, as always, the devil is in the details,” he said.

Tariff war throws building of data centers into disarray

Forrester’s bottom line? “Because of the long term planning and all of the potential policy changes, I wouldn’t change my data center plans that much,” Nguyen said. Confusion reigns Every day it seems, the tariff situation becomes muddier. For example, according to a fact sheet released Wednesday, the White House has temporarily exempted semiconductors from tariffs, but not the aluminum used to build the servers and racks that house them. Furthermore, Scott Bickley, advisory fellow at the Info-Tech Research Group, said it is important to note how the various countries match with the various components. “Just about every major cost center for the buildout of a data center will be severely impacted by the new tariffs. Servers and hardware, including semiconductors, memory, network components, cabling, construction materials are going to see prices rise overnight once the tariffs go into effect,” Bickley said. “Consider that China, which has a 54% full tariff, is a major source of raw materials and rare earth elements essential for manufacturing DC components while Taiwan, at a 32% tariff rate, is the sole-source provider country for most advanced chipsets used in AI, cell phones, and any modern application footprint requiring high performance in a small footprint. South Korea (25% tariff) is a key provider of memory chips, while Japan (24%), Germany (20% EU rate), and the Netherlands (20% EU rate) are providers of sub-components like server racks, cooling systems, and semiconductor equipment.” But, he continued: “Now factor in the offshore/nearshore contract manufacturers like Mexico and Vietnam (46%) for electronics manufacturing (assembly and distribution) and Malaysia (10%) for semiconductor packaging, and it is clear to see that the complete technology supply chain leading into the data center will be taxed at multiple touchpoints.” Put all of that together and Info-Tech anticipates a lot of enterprise data center pain.

New MLCommons benchmarks to test AI infrastructure performance

The latest release also broadens its scope beyond chatbot benchmarks. A new graph neural network (GNN) test targets datacenter-class hardware and is designed for workloads like fraud detection, recommendation engines, and knowledge graphs. It uses the RGAT model based on a graph dataset containing over 547 million nodes and 5.8 billion edges. Judging performance Analysts suggest that these benchmarks will make it easier to judge the performance of various hardware chips and clusters based on documented models. “As every chipmaker seeks to prove that its hardware is good enough to support AI, we now have a standard benchmark that shows the quality of question support, math, and coding skills associated with hardware,” said Hyoun Park, CEO and Chief Analyst at Amalgam Insights. Chipmakers can now compete not just on traditional speeds and feeds, but in mathematical skill and informational accuracy. This benchmark provides a rare opportunity to add new performance standards on cross-vendor hardware, Park added. “The latency in terms of how quickly tokens are delivered and the time for the user to see the response is the deciding factor,” said Neil Shah, partner and co-founder at Counterpoint Research. “This is where players such as NVIDIA, AMD, and Intel have to get the software right to help developers optimize the models and bring out the best compute performance.” Benchmarking and buying decisions Independent benchmarks like those from MLCommons play a key role in helping buyers evaluate system performance, but relying on them alone may not provide the full picture.

Potential Nvidia chip shortage looms as Chinese customers rush to beat US sales ban

Will it lead to shortages? The US first placed export controls on chips sent to China in October 2022 as a means to slow the country’s technological advances. It blocked the sale of Nvidia’s A100 and H100 chips, leading the company to develop the less powerful A800 and H800 chips for the market; they were also subsequently banned. There was a surge in demand for the H20 following the arrival of Chinese startup DeepSeek’s ultra low-cost, open-source AI model in January. And while the H20 is reported to be 15 times slower than Nvidia’s newest Blackwell chips sold elsewhere in the world, it was designed specifically by Nvidia to comply with the further US export controls introduced in October 2023. It is being used by Chinese companies for training, although it’s billed as an inference chip, explained Matt Kimball, VP and principal analyst for datacenter compute and storage at Moor Insights & Strategy. Should Nvidia choose to focus its efforts on manufacturing more of the chips, Kimball said he doesn’t think it will impact supply in the US and Europe, as Blackwell is the main product sold in those markets and H20 is an N-1 Hopper architecture chip. “If you take this a step further and ask whether this large order slows down the production of chips destined for the US and Europe, I’d say the answer is no, as the Hopper family is built on a different process node than the Blackwell family,” he said. Still, Kimball noted, “supply chain management is difficult, especially for smaller organizations that are put to the back of the line as hyperscalers with multibillion dollar orders are first in line for the newest [chips].”

European cloud group invests to create what it dubs “Trump-proof cloud services”

But analysts have questioned whether the Microsoft move truly addresses those European business concerns. Phil Brunkard, executive counselor at Info-Tech Research Group UK, said, commenting on last month’s announcement of the EU Data Boundary for the Microsoft Cloud, “Microsoft says that customer data will remain stored and processed in the EU and EFTA, but doesn’t guarantee true data sovereignty.” And European companies are now rethinking what data sovereignty means to them. They are moving beyond having it refer to where the data sits to focusing on which vendors control it, and who controls them. Responding to the new Euro cloud plan, another analyst, IDC VP Dave McCarthy, saw the effort as “signaling a growing European push for data control and independence.” “US providers could face tougher competition from EU companies that leverage this tech to offer sovereignty-friendly alternatives. Although €1 million isn’t a game-changer on its own, it’s a clear sign Europe wants to build its own cloud ecosystem—potentially at the expense of US market share,” McCarthy said. “For US providers, this could mean investing in more EU-based data centers or reconfiguring systems to ensure European customers’ data stays within the region. This isn’t just a compliance checkbox. It’s a shift that could hike operational costs and complexity, especially for companies used to running centralized setups.” Adding to the potential bad news for US hyperscalers, McCarthy said that there was little reason to believe that this trend would be limited to Europe. “If Europe pulls this off, other regions might take note and push for similar sovereignty rules. US providers could find themselves adapting to a patchwork of regulations worldwide, forcing a rethink of their global strategies,” McCarthy said. “This isn’t just a European headache, it’s a preview of what could become a broader challenge.”

Talent gap complicates cost-conscious cloud planning

The top strategy so far is what one enterprise calls the “Cloud Team.” You assemble all your people with cloud skills, and your own best software architect, and have the team examine current and proposed cloud applications, looking for a high-level approach that meets business goals. In this process, the team tries to avoid implementation specifics, focusing instead on the notion that a hybrid application has an agile cloud side and a governance-and-sovereignty data center side, and what has to be done is push functionality into the right place. The Cloud Team supporters say that an experienced application architect can deal with the cloud in abstract, without detailed knowledge of cloud tools and costs. For example, the architect can assess the value of using an event-driven versus transactional model without fixating on how either could be done. The idea is to first come up with approaches. Then, developers could work with cloud providers to map each approach to an implementation, and assess the costs, benefits, and risks. Ok, I lied about this being the top strategy—sort of, at least. It’s the only strategy that’s making much sense. The enterprises all start their cloud-reassessment journey on a different tack, but they agree it doesn’t work. The knee-jerk approach to cloud costs is to attack the implementation, not the design. What cloud features did you pick? Could you find ones that cost less? Could you perhaps shed all the special features and just host containers or VMs with no web services at all? Enterprises who try this, meaning almost all of them, report that they save less than 15% on cloud costs, a rate of savings that means roughly a five-year payback on the costs of making the application changes…if they can make them at all. Enterprises used to build all of

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

From MIPS to exaflops in mere decades: Compute power is exploding, and it will transform AI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More At the recent Nvidia GTC conference,

From AI agent hype to practicality: Why enterprises must consider fit over flash

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More As we step fully into the