Stay Ahead, Stay ONMINE

When AI reasoning goes wrong: Microsoft Research shows more tokens can mean more problems

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Large language models (LLMs) are increasingly capable of complex reasoning through “inference-time scaling,” a set of techniques that allocate more computational resources during inference to generate answers. However, a new study from Microsoft Research reveals that […]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Large language models (LLMs) are increasingly capable of complex reasoning through “inference-time scaling,” a set of techniques that allocate more computational resources during inference to generate answers. However, a new study from Microsoft Research reveals that the effectiveness of these scaling methods isn’t universal. Performance boosts vary significantly across different models, tasks and problem complexities.

The core finding is that simply throwing more compute at a problem during inference doesn’t guarantee better or more efficient results. The findings can help enterprises better understand cost volatility and model reliability as they look to integrate advanced AI reasoning into their applications.

Putting scaling methods to the test

The Microsoft Research team conducted an extensive empirical analysis across nine state-of-the-art foundation models. This included both “conventional” models like GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Pro and Llama 3.1 405B, as well as models specifically fine-tuned for enhanced reasoning through inference-time scaling. This included OpenAI’s o1 and o3-mini, Anthropic’s Claude 3.7 Sonnet, Google’s Gemini 2 Flash Thinking, and DeepSeek R1.

They evaluated these models using three distinct inference-time scaling approaches:

  1. Standard Chain-of-Thought (CoT): The basic method where the model is prompted to answer step-by-step.
  2. Parallel Scaling: the model generates multiple independent answers for the same question and uses an aggregator (like majority vote or selecting the best-scoring answer) to arrive at a final result.
  3. Sequential Scaling: The model iteratively generates an answer and uses feedback from a critic (potentially from the model itself) to refine the answer in subsequent attempts.

These approaches were tested on eight challenging benchmark datasets covering a wide range of tasks that benefit from step-by-step problem-solving: math and STEM reasoning (AIME, Omni-MATH, GPQA), calendar planning (BA-Calendar), NP-hard problems (3SAT, TSP), navigation (Maze) and spatial reasoning (SpatialMap).

Several benchmarks included problems with varying difficulty levels, allowing for a more nuanced understanding of how scaling behaves as problems become harder.

“The availability of difficulty tags for Omni-MATH, TSP, 3SAT, and BA-Calendar enables us to analyze how accuracy and token usage scale with difficulty in inference-time scaling, which is a perspective that is still underexplored,” the researchers wrote in the paper detailing their findings.

The researchers evaluated the Pareto frontier of LLM reasoning by analyzing both accuracy and the computational cost (i.e., the number of tokens generated). This helps identify how efficiently models achieve their results. 

Inference-time scaling pareto
Inference-time scaling Pareto frontier Credit: arXiv

They also introduced the “conventional-to-reasoning gap” measure, which compares the best possible performance of a conventional model (using an ideal “best-of-N” selection) against the average performance of a reasoning model, estimating the potential gains achievable through better training or verification techniques.

More compute isn’t always the answer

The study provided several crucial insights that challenge common assumptions about inference-time scaling:

Benefits vary significantly: While models tuned for reasoning generally outperform conventional ones on these tasks, the degree of improvement varies greatly depending on the specific domain and task. Gains often diminish as problem complexity increases. For instance, performance improvements seen on math problems didn’t always translate equally to scientific reasoning or planning tasks.

Token inefficiency is rife: The researchers observed high variability in token consumption, even between models achieving similar accuracy. For example, on the AIME 2025 math benchmark, DeepSeek-R1 used over five times more tokens than Claude 3.7 Sonnet for roughly comparable average accuracy. 

More tokens do not lead to higher accuracy: Contrary to the intuitive idea that longer reasoning chains mean better reasoning, the study found this isn’t always true. “Surprisingly, we also observe that longer generations relative to the same model can sometimes be an indicator of models struggling, rather than improved reflection,” the paper states. “Similarly, when comparing different reasoning models, higher token usage is not always associated with better accuracy. These findings motivate the need for more purposeful and cost-effective scaling approaches.”

Cost nondeterminism: Perhaps most concerning for enterprise users, repeated queries to the same model for the same problem can result in highly variable token usage. This means the cost of running a query can fluctuate significantly, even when the model consistently provides the correct answer. 

variance in model outputs
Variance in response length (spikes show smaller variance) Credit: arXiv

The potential in verification mechanisms: Scaling performance consistently improved across all models and benchmarks when simulated with a “perfect verifier” (using the best-of-N results). 

Conventional models sometimes match reasoning models: By significantly increasing inference calls (up to 50x more in some experiments), conventional models like GPT-4o could sometimes approach the performance levels of dedicated reasoning models, particularly on less complex tasks. However, these gains diminished rapidly in highly complex settings, indicating that brute-force scaling has its limits.

GPT-4o inference-time scaling
On some tasks, the accuracy of GPT-4o continues to improve with parallel and sequential scaling. Credit: arXiv

Implications for the enterprise

These findings carry significant weight for developers and enterprise adopters of LLMs. The issue of “cost nondeterminism” is particularly stark and makes budgeting difficult. As the researchers point out, “Ideally, developers and users would prefer models for which the standard deviation on token usage per instance is low for cost predictability.”

“The profiling we do in [the study] could be useful for developers as a tool to pick which models are less volatile for the same prompt or for different prompts,” Besmira Nushi, senior principal research manager at Microsoft Research, told VentureBeat. “Ideally, one would want to pick a model that has low standard deviation for correct inputs.” 

Models that peak blue to the left consistently generate the same number of tokens at the given task Credit: arXiv

The study also provides good insights into the correlation between a model’s accuracy and response length. For example, the following diagram shows that math queries above ~11,000 token length have a very slim chance of being correct, and those generations should either be stopped at that point or restarted with some sequential feedback. However, Nushi points out that models allowing these post hoc mitigations also have a cleaner separation between correct and incorrect samples.

“Ultimately, it is also the responsibility of model builders to think about reducing accuracy and cost non-determinism, and we expect a lot of this to happen as the methods get more mature,” Nushi said. “Alongside cost nondeterminism, accuracy nondeterminism also applies.”

Another important finding is the consistent performance boost from perfect verifiers, which highlights a critical area for future work: building robust and broadly applicable verification mechanisms. 

“The availability of stronger verifiers can have different types of impact,” Nushi said, such as improving foundational training methods for reasoning. “If used efficiently, these can also shorten the reasoning traces.”

Strong verifiers can also become a central part of enterprise agentic AI solutions. Many enterprise stakeholders already have such verifiers in place, which may need to be repurposed for more agentic solutions, such as SAT solvers, logistic validity checkers, etc. 

“The questions for the future are how such existing techniques can be combined with AI-driven interfaces and what is the language that connects the two,” Nushi said. “The necessity of connecting the two comes from the fact that users will not always formulate their queries in a formal way, they will want to use a natural language interface and expect the solutions in a similar format or in a final action (e.g. propose a meeting invite).”

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Arm jumps on the Nvidia NVLink Fusion bandwagon at SC25

“Some partners want to mix different CPUs and accelerate technologies for specialized use cases,” said Dion Harris, senior director, HPC and AI infrastructure solutions at Nvidia. “NVLink Fusion enables hyperscalers and custom ASIC builders to leverage Nvidia’s rack scale architecture to rapidly deploy custom silicon,” Harris said during a media

Read More »

Nvidia touts next-gen quantum computing interconnects

Paired with NVQLink is CUDA-Q, which provides a programming model where quantum processors (QPU), GPUs, and CPUs all work together in the same application. This is necessary because most useful quantum algorithms will rely on traditional computing for tasks like control, optimization, and error correction. The first quantum computing company

Read More »

ConocoPhillips Makes Offshore Gas Discovery in Australia’s Otway Basin

ConocoPhillips made a natural gas discovery offshore Victoria in the Otway Basin, though further work is needed to determine potential flow rates, the United States company’s Australian unit said Monday. “The Essington-1 well is the first discovery in the Otway since 2021 and is a promising start to ConocoPhillips’ exploration activities in the region”, ConocoPhillips Australia president Jan-Arne Johansen said in an online statement. “The initial results are encouraging, and we look forward to continuing drilling our second exploration well in December”. ConocoPhillips Australia said, “Preliminary estimates from logs and wireline results place the primary Waarre A target reservoir as a 62.6-meter gross hydrocarbon column. The secondary Waarre C target shows a further 33.2-meter gross hydrocarbon column as best estimates”. 3D Energi said separately, “Elevated gas readings were recorded in both the Waarre C (intersected at 2,265 meters MDRT) and Waarre A (intersected at 2,515 meters MDRT) reservoirs”. “In both reservoirs, gas peaks coincide with elevated resistivity readings observed on Logging While Drilling tools, consistent with probable hydrocarbon presence”, 3D Energi added. The discovery sits 12 kilometers (7.46 miles) from producing gas wells and about 53 kilometers (32.93 miles) from Port Campbell, Victoria, according to ConocoPhillips Australia. “Further work will be conducted to determine potential flow rates, the reservoir’s ultimate resource recovery and the commercial viability for potential development plans”, ConocoPhillips Australia said. The partners expect to complete operations at the well this month, after which the well will be plugged and abandoned. “A second well in VIC/P79 (Charlemont-1) expected to commence in December (weather and operational conditions permitting) and additional wells may be considered in the future under the accepted Environmental Plan”, ConocoPhillips Australia said. It announced the start of the Otway exploration campaign November 1, “in an effort to find new domestic natural gas supply and be part of

Read More »

Turkey Plans $4B Sukuk in Energy Production Push

Turkey’s state energy company Turkiye Petrolleri AO plans to sell as much as $4 billion in Islamic debt as part of its push to expand oil and gas production, marking the firm’s first such international debt offering. The company, also known by its Turkish initials TPAO, is preparing to issue the five-year sukuk to international investors by the end of the year, Energy Minister Alparslan Bayraktar told Bloomberg on Monday. The debut sukuk follows non-deal roadshow meetings in London, Abu Dhabi and Dubai, where officials briefed potential investors on TPAO’s financial outlook and projects, including Black Sea natural gas production and the Gabar oil field in Turkey’s southeast, he said. Owned by Turkey’s sovereign wealth fund, TPAO also has a growing portfolio of international projects including exploration plans in Libya, Oman and Pakistan alongside existing production in Azerbaijan, Iraq and Russia.  TPAO produced 33.7 million barrels of oil and 2.2 billion cubic meters of gas in Turkey in 2024, former CEO Ahmet Turkoglu told a parliamentary commission earlier this year. It also pumped 39.4 million barrels of oil equivalent from international projects.  He said that the company made a profit of 15.4 billion liras last year – equivalent to around $390 million at the time of the comments. Production is set to increase both at home and abroad. Turkey plans to increase output at the main Black Sea gas field, Sakarya, to 45 million cubic meters per day in 2028 from the current 9.5 mcm, Bayraktar said. TPAO is also planning to develop unconventional reserves in the southeast in partnership with US-based Continental Resources, Inc. and TransAtlantic Petroleum Ltd.  TPAO established a subsidiary, TPAO Varlik Kiralama, earlier this month to manage the sukuk issuance. The debt sale comes as Turkey’s borrowing costs decline due to an easing of political tensions at

Read More »

ADNOC Gas Achieves Record Q3

ADNOC Gas PLC has reported an eight percent year-on-year increase in net profit to $1.34 billion for the third quarter, the company’s highest for the July-September period. The increase was driven by a four percent rise in domestic gas sales volumes, according to an online statement by the company. Demand is supported by growth in the United Arab Emirates’ economy, while contract negotiations also improved underlying margins, said the gas processing and sales arm of Abu Dhabi National Oil Co. Earnings per share landed at $0.017. ADNOC Gas has extended its five percent annual dividend growth policy to 2030, aiming for $24.4 billion in total for 2025-30, according to a stock filing October 8. ADNOC Gas has introduced a policy to distribute dividends quarterly starting with Q3 2025. “The introduction of quarterly dividend distributions starting in Q3 2025 with $896 million to be paid by December 12 – alongside a five percent annual increase in dividend payout now extended until 2030 – offers greater transparency and even more regular income, allowing shareholders to plan and manage their finances with confidence”, it said in its quarterly statement. ADNOC Gas said, “Year-to-date net income reached $3.99 billion, exceeding market expectations, even as oil prices averaged $71/barrel in the first nine months of 2025 compared to $83/barrel in 2024”. “Q3 2025 saw ADNOC Gas’ domestic gas business deliver record results, with EBITDA rising to $914 million, up 26 percent year-on-year”. On lower prices, revenue fell from $4.87 billion for Q3 2024 to $4.86 billion for Q3 2025. Operating profit landed at $1.74 billion, up from $1.69 billion for Q3 2024. Profit before tax was $1.72 billion, up from $1.68 billion for Q3 2024. Net cash from operating activities before changes in working capital was $4.65 billion, up from $4.24 billion for Q3 2024. ADNOC Gas ended

Read More »

Oil Slips as Russia Port Reopens

Oil ticked lower as signs that activity had resumed at a key Russian port were countered by wider geopolitical risks to prices. West Texas Intermediate fell 0.3% to settle below $60 a barrel after adding more than 2% on Friday following an attack on Russia’s Novorossiysk facility. Two tankers moored on Sunday at the port, indicating operational activity. The dollar strengthened, making commodities priced in the currency less attractive.  The attack on Novorossiysk by Ukrainian forces, along with Iran’s seizure of an oil tanker near the Strait of Hormuz, injected a fresh geopolitical premium into prices as the market faces pressure from an emerging global surplus.  Traders are also monitoring the Trump Administration’s plans in oil-rich Venezuela. US President Donald Trump said on Monday he is not ruling out sending troops to the South American country and said he is willing to talk to counterpart Nicolas Maduro. Elsewhere, crude oil exports from Sudan were disrupted after a series of attacks hit energy facilities in the country, which serves as a key conduit for crude from landlocked South Sudan. Those risks are countering moves by OPEC+ and producers from outside of the group to ramp up output. The increases leave most traders expecting a significant surplus over the coming months.  “Brent crude oil prices continue to fluctuate in a $60-$70 a-barrel range, with the market focus shifting to how Russian oil exports will evolve over the coming months,” UBS analyst Giovanni Staunovo wrote in a note. “The market appears skeptical that Russia will struggle to export its oil barrels.” Moscow’s oil has begun to trade at a significant discount in recent days as the deadline nears for fresh sanctions on its two major producers to kick in. Prices are at the lowest level in over two-and-a-half years, according to Argus Media

Read More »

Joint Statement by the U.S. and the Baltic Countries Following the 2025 Baltic 3+1 Energy Dialogue

On November 7, the Ministers of Energy for the three Baltic countries of Estonia, Latvia, and Lithuania and the U.S. Secretary of Energy convened for a fifth 3+1 Energy Dialogue in Athens, Greece, on the margins of the Partnership for Transatlantic Energy Cooperation (P-TEC) Ministerial. Estonia’s Minister of Energy and Environment Andres Sutt, Latvia’s Minister for Climate and Energy Kaspars Melnis, Lithuania’s Minister of Energy Žygimantas Vaičiūnas, and U.S. Secretary of Energy Chris Wright reaffirmed their shared commitment to strengthening transatlantic relations and deepening the strategic partnership; increasing the security, resilience and protection of critical energy infrastructure; growing U.S. liquified natural gas (LNG) imports for European independence from Russian energy and to support Ukraine; and enabling innovative nuclear deployment. The ministers also celebrated the success of their joint multi-year effort to desynchronize the Baltic electricity grid from Russia and synchronize with the Continental European network, which was achieved in February of this year.   The Ministers and Secretary discussed ongoing critical energy infrastructure security and resilience concerns acutely affecting the Baltic region, but applicable globally, including: physical security of undersea cables and pipelines and physical and cyber-security of electricity grid infrastructure ranging from the bulk power system to behind-the-meter technologies. The group noted recent exercises conducted by the U.S. Department of Energy and experts at the Pacific Northwest National Laboratory in Riga, Latvia to enhance responsiveness to grid disruptions and pledged to continue cooperation on infrastructure security.   The Ministers and Secretary also noted the importance of the security of energy supply, emphasizing that there can be no national security without energy security. They discussed a shared desire to increase the import of abundant U.S. LNG to the Baltic region and the whole of Europe. They also considered the possibility of transferring imported LNG to Ukraine on the basis of negotiated agreements

Read More »

The quiet revolution in energy: How private innovation is reshaping the grid

Over the past five years, the energy transition has been loud in some places.  We have seen data centers booming, capacity prices climbing, renewables surging and virtual power plants gaining traction. But beneath those headlines, a revolution message has been unfolding, sector by sector, driven not by policy or utilities but by private businesses stepping into roles once reserved for monopolies. Take Caterpillar, for example. Long known for its heavy machinery, the company now offers a Distributed Energy Resource Management System (DERMS) platform, an unexpected pivot that underscores how deeply energy intelligence is being woven into commercial and industrial operations. From retail and manufacturing to real estate and tech, companies are learning that energy is no longer a fixed cost. It’s a controllable variable, one that can be optimized, monetized and aligned with long-term business strategy. Commercial Customers: The Grid’s Unsung Stabilizers At the center of this shift are commercial customers, who are increasingly the steady force in an uncertain grid. Unlike residential users, whose Peak Load Contribution (PLC) is largely uncontrollable and often meaningless for planning or cost-reduction purposes, commercial PLCs can be actively managed. And that’s where the real innovation is happening. Multifamily communities are increasingly operating like commercial customers, and that shift is transforming their role on the grid. By pairing the right supply contracts with DERMS platforms, multifamily developers and operators can now manage usage, flexibility and capacity just like office parks or manufacturing sites have done for years. This isn’t theoretical; it’s already underway. Properties are optimizing load, reducing costs and contributing to grid stability by applying commercial-grade energy intelligence to what was once a passive residential segment. The result: a fast-growing, data-driven force for grid support emerging from the multifamily sector. From Energy Management to Energy Mastery Nationwide Energy Partners (NEP) has seen this

Read More »

Nvidia’s first exascale system is the 4th fastest supercomputer in the world

The world’s fourth exascale supercomputer has arrived, pitting Nvidia’s proprietary chip technologies against the x86 systems that have dominated supercomputing for decades. For the 66th edition of the TOP500, El Capitan holds steady at No. 1 while JUPITER Booster becomes the fourth exascale system on the list. The JUPITER Booster supercomputer, installed in Germany, uses Nvidia CPUs and GPUs and delivers a peak performance of exactly 1 exaflop, according to the November TOP500 list of supercomputers, released on Monday. The exaflop measurement is considered a major milestone in pushing computing performance to the limits. Today’s computers are typically measured in gigaflops and teraflops—and an exaflop translates to 1 billion gigaflops. Nvidia’s GPUs dominate AI servers installed in data centers as computing shifts to AI. As part of this shift, AI servers with Nvidia’s ARM-based Grace CPUs are emerging as a high-performance alternative to x86 chips. JUPITER is the fourth-fastest supercomputer in the world, behind three systems with x86 chips from AMD and Intel, according to TOP500. The top three supercomputers on the TOP500 list are in the U.S. and owned by the U.S. Department of Energy. The top two supercomputers—the 1.8-exaflop El Capitan at Lawrence Livermore National Laboratory and the 1.35-exaflop Frontier at Oak Ridge National Laboratory—use AMD CPUs and GPUs. The third-ranked 1.01-exaflop Aurora at Argonne National Laboratory uses Intel CPUs and GPUs. Intel scrapped its GPU roadmap after the release of Aurora and is now restructuring operations. The JUPITER Booster, which was assembled by France-based Eviden, has Nvidia’s GH200 superchip, which links two Nvidia Hopper GPUs with CPUs based on ARM designs. The CPU and GPU are connected via Nvidia’s proprietary NVLink interconnect, which is based on InfiniBand and provides bandwidth of up to 900 gigabytes per second. JUPITER first entered the Top500 list at 793 petaflops, but

Read More »

Samsung’s 60% memory price hike signals higher data center costs for enterprises

Industry-wide price surge driven by AI Samsung is not alone in raising prices. In October, TrendForce reported that Samsung and SK Hynix raised DRAM and NAND flash prices by up to 30% for Q4. Similarly, SK Hynix said during its October earnings call that its HBM, DRAM, and NAND capacity is “essentially sold out” for 2026, with the company posting record quarterly operating profit exceeding $8 billion, driven by surging AI demand. Industry analysts attributed the price increases to manufacturers redirecting production capacity. HBM production for AI accelerators consumes three times the wafer capacity of standard DRAM, according to a TrendForce report, citing remarks from Micron’s Chief Business Officer. After two years of oversupply, memory inventories have dropped to approximately eight weeks from over 30 weeks in early 2023. “The memory industry is tightening faster than expected as AI server demand for HBM, DDR5, and enterprise SSDs far outpaces supply growth,” said Manish Rawat, semiconductor analyst at TechInsights. “Even with new fab capacity coming online, much of it is dedicated to HBM, leaving conventional DRAM and NAND undersupplied. Memory is shifting from a cyclical commodity to a strategic bottleneck where suppliers can confidently enforce price discipline.” This newfound pricing power was evident in Samsung’s approach to contract negotiations. “Samsung’s delayed pricing announcement signals tough behind-the-scenes negotiations, with Samsung ultimately securing the aggressive hike it wanted,” Rawat said. “The move reflects a clear power shift toward chipmakers: inventories are normalized, supply is tight, and AI demand is unavoidable, leaving buyers with little room to negotiate.” Charlie Dai, VP and principal analyst at Forrester, said the 60% increase “signals confidence in sustained AI infrastructure growth and underscores memory’s strategic role as the bottleneck in accelerated computing.” Servers to cost 10-25% more For enterprises building AI infrastructure, these supply dynamics translate directly into

Read More »

Arista, Palo Alto bolster AI data center security

“Based on this inspection, the NGFW creates a comprehensive, application-aware security policy. It then instructs the Arista fabric to enforce that policy at wire speed for all subsequent, similar flows,” Kotamraju wrote. “This ‘inspect-once, enforce-many’ model delivers granular zero trust security without the performance bottlenecks of hairpinning all traffic through a firewall or forcing a costly, disruptive network redesign.” The second capability is a dynamic quarantine feature that enables the Palo Alto NGFWs to identify evasive threats using Cloud-Delivered Security Services (CDSS). “These services, such as Advanced WildFire for zero-day malware and Advanced Threat Prevention for unknown exploits, leverage global threat intelligence to detect and block attacks that traditional security misses,” Kotamraju wrote. The Arista fabric can intelligently offload trusted, high-bandwidth “elephant flows” from the firewall after inspection, freeing it to focus on high-risk traffic. When a threat is detected, the NGFW signals Arista CloudVision, which programs the network switches to automatically quarantine the compromised workload at hardware line-rate, according to Kotamraju: “This immediate response halts the lateral spread of a threat without creating a performance bottleneck or requiring manual intervention.” The third feature is unified policy orchestration, where Palo Alto Networks’ management plane centralizes zone-based and microperimeter policies, and CloudVision MSS responds with the offload and enforcement of Arista switches. “This treats the entire geo-distributed network as a single logical switch, allowing workloads to be migrated freely across cloud networks and security domains,” Srikanta and Barbieri wrote. Lastly, the Arista Validated Design (AVD) data models enable network-as-a-code, integrating with CI/CD pipelines. AVDs can also be generated by Arista’s AVA (Autonomous Virtual Assist) AI agents that incorporate best practices, testing, guardrails, and generated configurations. “Our integration directly resolves this conflict by creating a clean architectural separation that decouples the network fabric from security policy. This allows the NetOps team (managing the Arista

Read More »

AMD outlines ambitious plan for AI-driven data centers

“There are very beefy workloads that you must have that performance for to run the enterprise,” he said. “The Fortune 500 mainstream enterprise customers are now … adopting Epyc faster than anyone. We’ve seen a 3x adoption this year. And what that does is drives back to the on-prem enterprise adoption, so that the hybrid multi-cloud is end-to-end on Epyc.” One of the key focus areas for AMD’s Epyc strategy has been our ecosystem build out. It has almost 180 platforms, from racks to blades to towers to edge devices, and 3,000 solutions in the market on top of those platforms. One of the areas where AMD pushes into the enterprise is what it calls industry or vertical workloads. “These are the workloads that drive the end business. So in semiconductors, that’s telco, it’s the network, and the goal there is to accelerate those workloads and either driving more throughput or drive faster time to market or faster time to results. And we almost double our competition in terms of faster time to results,” said McNamara. And it’s paying off. McNamara noted that over 60% of the Fortune 100 are using AMD, and that’s growing quarterly. “We track that very, very closely,” he said. The other question is are they getting new customer acquisitions, customers with Epyc for the first time? “We’ve doubled that year on year.” AMD didn’t just brag, it laid out a road map for the next two years, and 2026 is going to be a very busy year. That will be the year that new CPUs, both client and server, built on the Zen 6 architecture begin to appear. On the server side, that means the Venice generation of Epyc server processors. Zen 6 processors will be built on 2 nanometer design generated by (you guessed

Read More »

Building the Regional Edge: DartPoints CEO Scott Willis on High-Density AI Workloads in Non-Tier-One Markets

When DartPoints CEO Scott Willis took the stage on “the Distributed Edge” panel at the 2025 Data Center Frontier Trends Summit, his message resonated across a room full of developers, operators, and hyperscale strategists: the future of AI infrastructure will be built far beyond the nation’s tier-one metros. On the latest episode of the Data Center Frontier Show, Willis expands on that thesis, mapping out how DartPoints has positioned itself for a moment when digital infrastructure inevitably becomes more distributed, and why that moment has now arrived. DartPoints’ strategy centers on what Willis calls the “regional edge”—markets in the Midwest, Southeast, and South Central regions that sit outside traditional cloud hubs but are increasingly essential to the evolving AI economy. These are not tower-edge micro-nodes, nor hyperscale mega-campuses. Instead, they are regional data centers designed to serve enterprises with colocation, cloud, hybrid cloud, multi-tenant cloud, DRaaS, and backup workloads, while increasingly accommodating the AI-driven use cases shaping the next phase of digital infrastructure. As inference expands and latency-sensitive applications proliferate, Willis sees the industry’s momentum bending toward the very markets DartPoints has spent years cultivating. Interconnection as Foundation for Regional AI Growth A key part of the company’s differentiation is its interconnection strategy. Every DartPoints facility is built to operate as a deeply interconnected environment, drawing in all available carriers within a market and stitching sites together through a regional fiber fabric. Willis describes fiber as the “nervous system” of the modern data center, and for DartPoints that means creating an interconnection model robust enough to support a mix of enterprise cloud, multi-site disaster recovery, and emerging AI inference workloads. The company is already hosting latency-sensitive deployments in select facilities—particularly inference AI and specialized healthcare applications—and Willis expects such deployments to expand significantly as regional AI architectures become more widely

Read More »

Key takeaways from Cisco Partner Summit

Brian Ortbals, senior vice president from World Wide Technology, which is one of Cisco’s biggest and most important partners stated: “Cisco engaged partners early in the process and took our feedback along the way. We believe now is the right time for these changes as it will enable us to capitalize on the changes in the market.” The reality is, the more successful its more-than-half-a-million partners are, the more successful Cisco will be. Platform approach is coming together When Jeetu Patel took the reigns as chief product officer, one of his goals was to make the Cisco portfolio a “force multiple.” Patel has stated repeatedly that, historically, Cisco acted more as a technology holding company with good products in networking, security, collaboration, data center and other areas. In this case, product breadth was not an advantage, as everything must be sold as “best of breed,” which is a tough ask of the salesforce and partner community. Since then, there have been many examples of the coming together of the portfolio to create products that leverage the breadth of the platform. The latest is the Unified Edge appliance, an all-in-one solution that brings together compute, networking, storage and security. Cisco has been aggressive with AI products in the data center, and Cisco Unified Edge compliments that work with a device designed to bring AI to edge locations. This is ideally suited for retail, manufacturing, healthcare, factories and other industries where it’s more cost effecting and performative to run AI where the data lives.

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »