Stay Ahead, Stay ONMINE

DeepSeek R1’s bold bet on reinforcement learning: How it outpaced OpenAI at 3% of the cost

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More DeepSeek R1’s Monday release has sent shockwaves through the AI community, disrupting assumptions about what’s required to achieve cutting-edge AI performance. Matching OpenAI’s o1 at just 3%-5% of the cost, this open-source model has not only […]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


DeepSeek R1’s Monday release has sent shockwaves through the AI community, disrupting assumptions about what’s required to achieve cutting-edge AI performance. Matching OpenAI’s o1 at just 3%-5% of the cost, this open-source model has not only captivated developers but also challenges enterprises to rethink their AI strategies.

The model has rocketed to the top-trending model being downloaded on HuggingFace (109,000, as of this writing) – as developers rush to try it out and seek to understand what it means for their AI development. Users are commenting that DeepSeek’s accompanying search feature (which you can find at DeepSeek’s site) is now superior to competitors like OpenAI and Perplexity, and is only rivaled by Google’s Gemini Deep Research.

The implications for enterprise AI strategies are profound: With reduced costs and open access, enterprises now have an alternative to costly proprietary models like OpenAI’s. DeepSeek’s release could democratize access to cutting-edge AI capabilities, enabling smaller organizations to compete effectively in the AI arms race.

This story focuses on exactly how DeepSeek managed this feat, and what it means for the vast number of users of AI models. For enterprises developing AI-driven solutions, DeepSeek’s breakthrough challenges assumptions of OpenAI’s dominance — and offers a blueprint for cost-efficient innovation. It’s the “how” DeepSeek did what it did that should be the most educational here.

DeepSeek’s breakthrough: Moving to pure reinforcement learning

In November, DeepSeek made headlines with its announcement that it had achieved performance surpassing OpenAI’s o1, but at the time it only offered a limited R1-lite-preview model. With Monday’s full release of R1 and the accompanying technical paper, the company revealed a surprising innovation: a deliberate departure from the conventional supervised fine-tuning (SFT) process widely used in training large language models (LLMs).

SFT, a standard step in AI development, involves training models on curated datasets to teach step-by-step reasoning, often referred to as chain-of-thought (CoT). It is considered essential for improving reasoning capabilities. However, DeepSeek challenged this assumption by skipping SFT entirely, opting instead to rely on reinforcement learning (RL) to train the model.

This bold move forced DeepSeek-R1 to develop independent reasoning abilities, avoiding the brittleness often introduced by prescriptive datasets. While some flaws emerge – leading the team to reintroduce a limited amount of SFT during the final stages of building the model – the results confirmed the fundamental breakthrough: reinforcement learning alone could drive substantial performance gains.

The company got much of the way using open source – a conventional and unsurprising way

First, some background on how DeepSeek got to where it did. DeepSeek, a 2023 spin-off from Chinese hedge-fund High-Flyer Quant, began by developing AI models for its proprietary chatbot before releasing them for public use.  Little is known about the company’s exact approach, but it quickly open sourced its models, and it’s extremely likely that the company built upon the open projects produced by Meta, for example the Llama model, and ML library Pytorch. 

To train its models, High-Flyer Quant secured over 10,000 Nvidia GPUs before U.S. export restrictions, and reportedly expanded to 50,000 GPUs through alternative supply routes, despite trade barriers. This pales compared to leading AI labs like OpenAI, Google, and Anthropic, which operate with more than 500,000 GPUs each.  

DeepSeek’s ability to achieve competitive results with limited resources highlights how ingenuity and resourcefulness can challenge the high-cost paradigm of training state-of-the-art LLMs.

Despite speculation, DeepSeek’s full budget is unknown

DeepSeek reportedly trained its base model — called V3 — on a $5.58 million budget over two months, according to Nvidia engineer Jim Fan. While the company hasn’t divulged the exact training data it used (side note: critics say this means DeepSeek isn’t truly open-source), modern techniques make training on web and open datasets increasingly accessible. Estimating the total cost of training DeepSeek-R1 is challenging. While running 50,000 GPUs suggests significant expenditures (potentially hundreds of millions of dollars), precise figures remain speculative.

What’s clear, though, is that DeepSeek has been very innovative from the get-go. Last year, reports emerged about some initial innovations it was making, around things like Mixture of Experts and Multi-Head Latent Attention.

How DeepSeek-R1 got to the “aha moment”

The journey to DeepSeek-R1’s final iteration began with an intermediate model, DeepSeek-R1-Zero, which was trained using pure reinforcement learning. By relying solely on RL, DeepSeek incentivized this model to think independently, rewarding both correct answers and the logical processes used to arrive at them.

This approach led to an unexpected phenomenon: The model began allocating additional processing time to more complex problems, demonstrating an ability to prioritize tasks based on their difficulty. DeepSeek’s researchers described this as an “aha moment,” where the model itself identified and articulated novel solutions to challenging problems (see screenshot below). This milestone underscored the power of reinforcement learning to unlock advanced reasoning capabilities without relying on traditional training methods like SFT.

Source: DeepSeek-R1 paper. Don’t let this graphic intimidate you. The key takeaway is the red line, where the model literally used the phrase “aha moment.” Researchers latched onto this as a striking example of the model’s ability to rethink problems in an anthropomorphic tone. For the researchers, they said it was their own “aha moment.”

The researchers conclude: “It underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies.”

More than RL

However, it’s true that the model needed more than just RL. The paper goes on to talk about how despite the RL creating unexpected and powerful reasoning behaviors, this intermediate model DeepSeek-R1-Zero did face some challenges, including poor readability, and language mixing (starting in Chinese and switching over to English, for example). So only then did the team decide to create a new model, which would become the final DeepSeek-R1 model. This model, again based on the V3 base model, was first injected with limited SFT – focused on a “small amount of long CoT data” or what was called cold-start data, to fix some of the challenges. After that, it was put through the same reinforcement learning process of R1-Zero. The paper then talks about how R1 went through some final rounds of fine-tuning.

The ramifications

One question is why there has been so much surprise by the release. It’s not like open source models are new. Open Source models have a huge logic and momentum behind them. Their free cost and malleability is why we reported recently that these models are going to win in the enterprise.

Meta’s open-weights model Llama 3, for example, exploded in popularity last year, as it was fine-tuned by developers wanting their own custom models. Similarly, now DeepSeek-R1 is already being used to distill its reasoning into an array of other, much smaller models – the difference being that DeepSeek offers industry-leading performance. This includes running tiny versions of the model on mobile phones, for example.

DeepSeek-R1 not only performs better than the leading open source alternative, Llama 3. It shows its entire chain of thought of its answers transparently. Meta’s Llama hasn’t been instructed to do this as a default; it takes aggressive prompting of Llama to do this.

The transparency has also provided a PR black-eye to OpenAI, which has so far hidden its chains of thought from users, citing competitive reasons and not to confuse users when a model gets something wrong. Transparency allows developers to pinpoint and address errors in a model’s reasoning, streamlining customizations to meet enterprise requirements more effectively.

For enterprise decision-makers, DeepSeek’s success underscores a broader shift in the AI landscape: leaner, more efficient development practices are increasingly viable. Organizations may need to reevaluate their partnerships with proprietary AI providers, considering whether the high costs associated with these services are justified when open-source alternatives can deliver comparable, if not superior, results.

To be sure, no massive lead

While DeepSeek’s innovation is groundbreaking, by no means has it established a commanding market lead. Because it published its research, other model companies will learn from it, and adapt. Meta and Mistral, the French open source model company, may be a beat behind, but it will probably only be a few months before they catch up. As Meta’s lead researcher Yann Lecun put it: “The idea is that everyone profits from everyone else’s ideas. No one ‘outpaces’ anyone and no country ‘loses’ to another. No one has a monopoly on good ideas. Everyone’s learning from everyone else.” So it’s execution that matters.

Ultimately, it’s the consumers, startups and other users who will win the most, because DeepSeek’s offerings will continue to drive the price of using these models near zero (again aside from cost of running models at inference). This rapid commoditization could pose challenges – indeed, massive pain – for leading AI providers that have invested heavily in proprietary infrastructure. As many commentators have put it, including Chamath Palihapitiya, an investor and former executive at Meta, this could mean that years of OpEx and CapEx by OpenAI and others will be wasted.

There is substantial commentary about whether it is ethical to use the DeepSeek-R1 model because of the biases instilled in it by Chinese laws, for example that it shouldn’t answer questions about the Chinese government’s brutal crackdown at Tiananmen Square. Despite ethical concerns around biases, many developers view these biases as infrequent edge cases in real-world applications – and they can be mitigated through fine-tuning. Moreover, they point to different, but analogous biases that are held by models from OpenAI and other companies. Meta’s Llama has emerged as a popular open model despite its data sets not being made public, and despite hidden biases, and lawsuits being filed against it as a result.

Questions abound around the ROI of big investments by OpenAI

This all raises big questions about the investment plans pursued by OpenAI, Microsoft and others. OpenAI’s $500 billion Stargate project reflects its commitment to building massive data centers to power its advanced models. Backed by partners like Oracle and Softbank, this strategy is premised on the belief that achieving artificial general intelligence (AGI) requires unprecedented compute resources. However, DeepSeek’s demonstration of a high-performing model at a fraction of the cost challenges the sustainability of this approach, raising doubts about OpenAI’s ability to deliver returns on such a monumental investment.

Entrepreneur and commentator Arnaud Bertrand captured this dynamic, contrasting China’s frugal, decentralized innovation with the U.S. reliance on centralized, resource-intensive infrastructure: “It’s about the world realizing that China has caught up — and in some areas overtaken — the U.S. in tech and innovation, despite efforts to prevent just that.” Indeed, yesterday another Chinese company, ByteDance announced Doubao-1.5-pro, which Includes a “Deep Thinking” mode that surpasses OpenAI’s o1 on the AIME benchmark.

Want to dive deeper into how DeepSeek-R1 is reshaping AI development? Check out our in-depth discussion on YouTube, where I explore this breakthrough with ML developer Sam Witteveen. Together, we break down the technical details, implications for enterprises, and what this means for the future of AI:

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Ivanti warns customers of new critical flaw exploited in the wild

“The vulnerability is a buffer overflow with a limited character space, and therefore it was initially believed to be a low-risk denial-of-service vulnerability,” incident responders from Google-owned Mandiant wrote in a report on the flaw. “We assess it is likely the threat actor studied the patch for the vulnerability in

Read More »

A look back at Microsoft’s IPO

Speaking of good fortune, Fortune magazine was granted inside access to Gates, his executive and legal teams, and their Wall Street partners in the months leading up to the IPO. That arrangement resulted in a terrific fly-on-the-wall story published four months later. A few highlights gleaned from that story and

Read More »

ServiceNow to acquire Logik.ai to boost CRM portfolio

“With CPQ more seamlessly embedded into the sales and order management capabilities, sellers can increase productivity by exponentially reducing time towards building sales quotes and recording opportunities in the system. But also, as the system learns, it can also recommend the right products and services to add to a particular

Read More »

Mentor Capital Increases Permian Royalty Stakes

Mentor Capital Inc. has expanded its stake in the West Texas Permian Basin, snapping up eight new royalty interest lots in an all-cash deal. The company said in a media release that the royalty streams it purchased pay out a portion of revenue from the oil and gas production “off the top”. The company has no obligation to pay the expenses of the underlying production. With the purchase, Mentor increases its overall ownership of assets in the sector of oil and gas, coal, and uranium by 27.5 percent on a cost basis. “The three major Permian Basin pooled oil and gas projects that Mentor currently participates in represent in total approximately 131 producing wells plus a number of development opportunities”, Mentor said. “This large combined oil and gas footprint is expected to have considerable life. “As is now common in Permian oil fields, some existing and possible wells are projected to utilize multi-leg horizontal and directional drilling with parallel lateral lengths reaching out 2 to 3 miles”. On a cost basis, the latest follow-on purchase increases Mentor’s portfolio of classic energy assets owned to 10.92 cents per Mentor common share, with 21,686,105 shares outstanding, the company said. The acquisition follows Mentor’s purchase of a 25.127 net royalty acre portion of a producing 71-well pooled project in the West Texas Permian Basin in another all-cash transaction. Mentor said at the time its purchased royalty stream was the equivalent of 12.5 percent “off the top” of oil and gas revenues for its acreage, with no responsibility to pay any expenses. To contact the author, email andreson.n.paul@gmail.com WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed. MORE

Read More »

Kenya Extends Oil Import Deal with Gulf Oil Giants through 2027

Kenya has renewed a contract to purchase fuel on credit from three state-owned Gulf firms by 24 months and renegotiated lower margins. Saudi Aramco, Emirates National Oil Co. and Abu Dhabi National Oil Co. will continue to supply gasoline, diesel, kerosene and jet fuel under a 180-day credit plan, Energy and Petroleum Regulatory Authority Director-General Daniel Kiptoo said in an interview in the capital, Nairobi. The two-year extension will kick in “toward the end of the year” once the East African nation completes imports of previously agreed shipments, Kiptoo said. Volume uptake was hampered by neighboring Uganda’s decision to directly source its own fuel products, he said. “The plan has helped stabilize the currency. It also gives us security of supply even in the event of supply shocks,” Kiptoo said. “The structure is working and even other countries are coming to Kenya to replicate it.” Freight and premium costs will drop 11 percent to $78 per metric ton of diesel, 7 percent to $84 for gasoline and 13 percent to $97 for jet fuel. Prices for the products are based on S&P Global Platts benchmark, Kiptoo said. The arrangement saves local oil marketing companies the hassle of sourcing dollars for imports, according to Kiptoo. It is the second time authorities are renewing the contract first drawn up in 2023 as part of a strategy to ease pressure on forex reserves and to support the shilling. Market Distortions The extension is a change of heart for Kenya, which pledged “to exit the oil import arrangement, as we are cognizant of the distortions it has created in the foreign exchange market,” according to a Treasury letter to the International Monetary Fund published in November. It also highlighted “the accompanying increase in rollover risk of the private sector financing facilities supporting it and remain committed to

Read More »

DOE offers 16 locations for possible data center, energy infrastructure development

The U.S. Department of Energy has identified 16 federal locations for potential construction of data centers and associated energy resources. The agency on Thursday published a request for information for stakeholders, including grid operators, about the potential for projects that could be online in less than two years. “The global race for AI dominance is the next Manhattan project, and with President Trump’s leadership and the innovation of our National Labs, the United States can and will win,” Secretary of Energy Chris Wright said in a statement. DOE “is taking important steps to leverage our domestic resources to power the AI revolution.” Data centers today account for about 4.5% of U.S. electricity consumption, but could reach 12% by 2028, the Southwest Energy Efficiency Project noted in a recent report. The RFI aligns with plans Trump announced in January to accelerate power plant development for co-located artificial intelligence data centers using an energy emergency declaration. It is also similar to an executive order former President Biden signed in January, targeting development of AI data centers powered by clean energy. The RFI, however, does not specify clean energy will be used in powering data centers. Responses to the RFI are due within 30 days of its publication in the Federal Register. DOE “seeks to assess industry interest in developing, operating, and maintaining AI infrastructure on select DOE owned or managed lands, along with information on potential development approaches, technology solutions, operational models, and economic considerations,” according to the RFI. It also “seeks input from grid operators that serve DOE sites on opportunities and challenges associated with existing energy infrastructure and potential co-location of data centers with new energy generation.” The RFI seeks input on a range of subjects, including data center “power needs, timelines, and approaches to co-locating energy sources with data

Read More »

SPP to rely on demand response to help bridge shrinking power supplies: CEO Nickell

It is unlikely that enough power supplies can be built in time to meet near-term rising electricity demand in the Southwest Power Pool’s footprint, according to Lanny Nickell, SPP president and CEO. As a result, SPP will need to turn to demand response programs to help bridge that supply-demand gap, Nickell said Thursday during a meeting held by WIRES, a trade group focused on transmission issues. SPP expects its excess capacity will fall to 5% in 2029, down from 24% in 2020, according to Nickell. “Excess generating capacity is dwindling, and it’s dwindling to a point where it’s becoming dangerous,” he said. A lot of generation has to be added quickly to meet a one-day in 10-year loss of load expectation, according to Nickell. “I don’t think it can be added that quickly,” he said. “So what does that mean? Means we’re going to have to rely a lot more on demand response to help us meet this challenge.” SPP is developing a “comprehensive” demand response policy that includes more effective DR options, Nickell said. SPP expects that its peak load could grow to 97 GW by 2035 from 56 GW last year, driven by data centers, home heating electrification and electric vehicles, according to Nickell. SPP operates the grid and wholesale power markets in 14 states from northern Texas to Montana. The grid operator’s interconnection queue has about 135 GW of potential capacity, including nearly 23 GW of gas-fired generation, according to Nickell. “That is by far the most gas generation we’ve ever seen in our generator interconnection queue, by far, and it’s going to be valuable, because it’s going to provide that dispatchability that we need to offset the solar and the wind,” Nickell said. “The storage will be very helpful, too.” Last year, wind farms provided about

Read More »

WTI Sinks 14% in Two Days Amid Global Unrest

Oil tumbled to a four-year low, following a surprise output increase by OPEC+ and a rapidly escalating global trade war that’s also rattling commodities markets from metals to gas. Oil’s rout was triggered Thursday by US President Donald Trump deluge of tariffs, which threaten the global economy and energy consumption. Hours later, OPEC+ tripled a planned output hike for May, in what delegates called a deliberate effort to lower prices to punish members that were pumping above their quota. West Texas Intermediate futures have fallen about 14% in just two days — settling near $61 a barrel in a move similar to steep losses seen during the pandemic — while Brent also ended the day at the lowest since 2021. The declines were exacerbated on Friday by China’s retaliation against the US duties, including a 34% tariff on all imports from the US starting within a week. Other commodities also slumped as wider financial markets took a hit and fears mounted about weaker demand for raw materials. Copper slid as much as 7.7% to the lowest since January, while benchmark European natural gas futures at one point tumbled more than 10%. Glencore Plc shares plunged more than 9%, with fellow major miners BHP Group and Rio Tinto Group also sliding. Oil’s retreat represented a dramatic breakout from a price band of about $15 that has paralyzed trading and spurred bets on low volatility for much of the last six months. During that period, OPEC+ supply curbs were seen to put a floor under the market, while the group’s ample spare capacity acted as a ceiling. This week’s unexpected production increase raises questions about whether the alliance will continue to defend higher prices. The dual hit from OPEC+ and tariffs has prompted a rush by traders and Wall Street banks to

Read More »

Brookfield to Buy Colonial Pipeline Owner in $9B Deal

A group of investors led by Brookfield Infrastructure Partners LP agreed to acquire Colonial Enterprises Inc. in a deal that values the operator of the biggest US fuel pipeline at about $9 billion. Colonial’s five owners are selling their entire stakes to Brookfield, including a Shell plc unit that will transfer its roughly 16 percent interest for $1.45 billion, according to a statement Friday.  Colonial Pipeline operates one of the most important fuel conduits in the US, hauling more than 100 million gallons (2.5 million barrels) of fuel a day from Gulf Coast refineries to the Northeast. It was shut down for five days in 2021 after a cyberattack, leading to fuel shortages across the region.  The deal to buy it comes as a glacial federal permitting process and political opposition continue to make building new pipelines in the US exceedingly difficult – despite US President Donald Trump’s push to expand domestic energy infrastructure. Shares of Brookfield Infrastructure Partners fell 4.6 percent in New York Friday amid the broader market sell off.  Shell’s Midstream Operating unit will sell its stake to a Brookfield unit called Colossus AcquireCo. Colonial’s other owners are the industrial conglomerate Koch Inc., with 28.1 percent, a unit of private equity firm KKR & Co., with 23.4 percent, Canadian pension fund Caisse de Depot et Placement du Quebec, with 16.5 percent, and infrastructure owner IFM Investors Pty with a 15.8 percent share.  Brookfield has already invested in global pipeline assets. It owns a controlling stake in Brazil’s NTS pipeline that spans more than 2,000 kilometers. The asset manager was also part of a consortium that bought a $10.1 billion stake in Abu Dhabi’s natural-gas pipelines in 2020. Colonial is in the midst of a fight with oil majors and trading houses including Exxon Mobil Corp. and Trafigura that ship fuels along its

Read More »

New Intel CEO Lip-Bu Tan begins to lay out technology roadmap

He said that in the past, Intel designed hardware, then partners had to figure out developing the software to make it work. “The world has changed. You have to flip that around. Going forward, we will start with the problem, what you’re trying to solve, and the workloads you need to handle enable. Then we work backward from that, that require embrace the software 2.0 mentality, which means that having a software-first design,” said Tan. Analysts in attendance liked what they heard, even if it was limited in specificity. “What was clear to me was Tan will be focused on eliminating distractions, investing in talent and making sure the company has a more compelling roadmap to compete in the AI data center race,” said Daniel Newman, CEO of The Futurum Group. He said there was a cautious optimism evident at the event as the certainty of its new leadership provided a boost for its partners and employees. “However, there are still more questions than answers, and that should be expected, given his recent arrival and clear philosophy about what needs to come next, which in many ways starkly contrasted what came before,” said Newman. Bob O’Donnell, president and chief analyst with TECHnalysis Research, said the strategy that Tan discussed at his keynote isn’t really much different than those described by his predecessor: build great products and a great foundry business. “That’s not necessarily a bad thing, though, because I believe they’re ultimately the right things for the company to pursue. The difference is that Lip-Bu seemed more willing to tackle the challenge of right-sizing Intel and mentioned cutting things that aren’t core to the business. The big unanswered question is, however, what does he consider those areas/products to be so, as always, the devil is in the details,” he said.

Read More »

Tariff war throws building of data centers into disarray

Forrester’s bottom line? “Because of the long term planning and all of the potential policy changes, I wouldn’t change my data center plans that much,” Nguyen said. Confusion reigns Every day it seems, the tariff situation becomes muddier. For example, according to a fact sheet released Wednesday, the White House has temporarily exempted semiconductors from tariffs, but not the aluminum used to build the servers and racks that house them. Furthermore, Scott Bickley, advisory fellow at the Info-Tech Research Group, said it is important to note how the various countries match with the various components. “Just about every major cost center for the buildout of a data center will be severely impacted by the new tariffs. Servers and hardware, including semiconductors, memory, network components, cabling, construction materials are going to see prices rise overnight once the tariffs go into effect,” Bickley said. “Consider that China, which has a 54% full tariff, is a major source of raw materials and rare earth elements essential for manufacturing DC components while Taiwan, at a 32% tariff rate, is the sole-source provider country for most advanced chipsets used in AI, cell phones, and any modern application footprint requiring high performance in a small footprint. South Korea (25% tariff) is a key provider of memory chips, while Japan (24%), Germany (20% EU rate), and the Netherlands (20% EU rate) are providers of sub-components like server racks, cooling systems, and semiconductor equipment.” But, he continued: “Now factor in the offshore/nearshore contract manufacturers like Mexico and Vietnam (46%) for electronics manufacturing (assembly and distribution) and Malaysia (10%) for semiconductor packaging, and it is clear to see that the complete technology supply chain leading into the data center will be taxed at multiple touchpoints.” Put all of that together and Info-Tech anticipates a lot of enterprise data center pain.

Read More »

New MLCommons benchmarks to test AI infrastructure performance

The latest release also broadens its scope beyond chatbot benchmarks. A new graph neural network (GNN) test targets datacenter-class hardware and is designed for workloads like fraud detection, recommendation engines, and knowledge graphs. It uses the RGAT model based on a graph dataset containing over 547 million nodes and 5.8 billion edges. Judging performance Analysts suggest that these benchmarks will make it easier to judge the performance of various hardware chips and clusters based on documented models. “As every chipmaker seeks to prove that its hardware is good enough to support AI, we now have a standard benchmark that shows the quality of question support, math, and coding skills associated with hardware,” said Hyoun Park, CEO and Chief Analyst at Amalgam Insights.  Chipmakers can now compete not just on traditional speeds and feeds, but in mathematical skill and informational accuracy. This benchmark provides a rare opportunity to add new performance standards on cross-vendor hardware, Park added. “The latency in terms of how quickly tokens are delivered and the time for the user to see the response is the deciding factor,” said Neil Shah, partner and co-founder at Counterpoint Research. “This is where players such as NVIDIA, AMD, and Intel have to get the software right to help developers optimize the models and bring out the best compute performance.” Benchmarking and buying decisions Independent benchmarks like those from MLCommons play a key role in helping buyers evaluate system performance, but relying on them alone may not provide the full picture.

Read More »

Potential Nvidia chip shortage looms as Chinese customers rush to beat US sales ban

Will it lead to shortages? The US first placed export controls on chips sent to China in October 2022 as a means to slow the country’s technological advances. It blocked the sale of Nvidia’s A100 and H100 chips, leading the company to develop the less powerful A800 and H800 chips for the market; they were also subsequently banned. There was a surge in demand for the H20 following the arrival of Chinese startup DeepSeek’s ultra low-cost, open-source AI model in January. And while the H20 is reported to be 15 times slower than Nvidia’s newest Blackwell chips sold elsewhere in the world, it was designed specifically by Nvidia to comply with the further US export controls introduced in October 2023. It is being used by Chinese companies for training, although it’s billed as an inference chip, explained Matt Kimball, VP and principal analyst for datacenter compute and storage at Moor Insights & Strategy. Should Nvidia choose to focus its efforts on manufacturing more of the chips, Kimball said he doesn’t think it will impact supply in the US and Europe, as Blackwell is the main product sold in those markets and H20 is an N-1 Hopper architecture chip. “If you take this a step further and ask whether this large order slows down the production of chips destined for the US and Europe, I’d say the answer is no, as the Hopper family is built on a different process node than the Blackwell family,” he said. Still, Kimball noted, “supply chain management is difficult, especially for smaller organizations that are put to the back of the line as hyperscalers with multibillion dollar orders are first in line for the newest [chips].”

Read More »

European cloud group invests to create what it dubs “Trump-proof cloud services”

But analysts have questioned whether the Microsoft move truly addresses those European business concerns. Phil Brunkard, executive counselor at Info-Tech Research Group UK, said, commenting on last month’s announcement of the EU Data Boundary for the Microsoft Cloud,  “Microsoft says that customer data will remain stored and processed in the EU and EFTA, but doesn’t guarantee true data sovereignty.” And European companies are now rethinking what data sovereignty means to them. They are moving beyond having it refer to where the data sits to focusing on which vendors control it, and who controls them. Responding to the new Euro cloud plan, another analyst, IDC VP Dave McCarthy, saw the effort as “signaling a growing European push for data control and independence.” “US providers could face tougher competition from EU companies that leverage this tech to offer sovereignty-friendly alternatives. Although €1 million isn’t a game-changer on its own, it’s a clear sign Europe wants to build its own cloud ecosystem—potentially at the expense of US market share,” McCarthy said. “For US providers, this could mean investing in more EU-based data centers or reconfiguring systems to ensure European customers’ data stays within the region. This isn’t just a compliance checkbox. It’s a shift that could hike operational costs and complexity, especially for companies used to running centralized setups.” Adding to the potential bad news for US hyperscalers, McCarthy said that there was little reason to believe that this trend would be limited to Europe. “If Europe pulls this off, other regions might take note and push for similar sovereignty rules. US providers could find themselves adapting to a patchwork of regulations worldwide, forcing a rethink of their global strategies,” McCarthy said. “This isn’t just a European headache, it’s a preview of what could become a broader challenge.”

Read More »

Talent gap complicates cost-conscious cloud planning

The top strategy so far is what one enterprise calls the “Cloud Team.” You assemble all your people with cloud skills, and your own best software architect, and have the team examine current and proposed cloud applications, looking for a high-level approach that meets business goals. In this process, the team tries to avoid implementation specifics, focusing instead on the notion that a hybrid application has an agile cloud side and a governance-and-sovereignty data center side, and what has to be done is push functionality into the right place. The Cloud Team supporters say that an experienced application architect can deal with the cloud in abstract, without detailed knowledge of cloud tools and costs. For example, the architect can assess the value of using an event-driven versus transactional model without fixating on how either could be done. The idea is to first come up with approaches. Then, developers could work with cloud providers to map each approach to an implementation, and assess the costs, benefits, and risks. Ok, I lied about this being the top strategy—sort of, at least. It’s the only strategy that’s making much sense. The enterprises all start their cloud-reassessment journey on a different tack, but they agree it doesn’t work. The knee-jerk approach to cloud costs is to attack the implementation, not the design. What cloud features did you pick? Could you find ones that cost less? Could you perhaps shed all the special features and just host containers or VMs with no web services at all? Enterprises who try this, meaning almost all of them, report that they save less than 15% on cloud costs, a rate of savings that means roughly a five-year payback on the costs of making the application changes…if they can make them at all. Enterprises used to build all of

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »