Stay Ahead, Stay ONMINE

How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs)

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Very small language models (SLMs) can outperform leading large language models (LLMs) in reasoning tasks, according to a new study by Shanghai AI Laboratory. The authors show that with the right tools and test-time scaling techniques, […]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Very small language models (SLMs) can outperform leading large language models (LLMs) in reasoning tasks, according to a new study by Shanghai AI Laboratory. The authors show that with the right tools and test-time scaling techniques, an SLM with 1 billion parameters can outperform a 405B LLM on complicated math benchmarks.

The ability to deploy SLMs in complex reasoning tasks can be very useful as enterprises are looking for new ways to use these new models in different environments and applications.

Test-time scaling explained

Test-time scaling (TTS) is the process of giving LLMs extra compute cylces during inference to improve their performance on various tasks. Leading reasoning models, such as OpenAI o1 and DeepSeek-R1, use “internal TTS,” which means they are trained to “think” slowly by generating a long string of chain-of-thought (CoT) tokens.

An alternative approach is “external TTS,” where model performance is enhanced with (as the name implies) outside help. External TTS is suitable for repurposing exiting models for reasoning tasks without further fine-tuning them. An external TTS setup is usually composed of a “policy model,” which is the main LLM generating the answer, and a process reward model (PRM) that evaluates the policy model’s answers. These two components are coupled together through a sampling or search method. 

The easiest setup is “best-of-N,” where the policy model generates multiple answers and the PRM selects one or more best answers to compose the final response. More advanced external TTS methods use search. In “beam search,” the model breaks the answer down into multiple steps.

For each step, it samples multiple answers and runs them through the PRM. It then chooses one or more suitable candidates and generates the next step of the answer. And, in “diverse verifier tree search” (DVTS), the model generates several branches of answers to create a more diverse set of candidate responses before synthesizing them into a final answer.

Different test-time scaling methods (source: arXiv)

What is the right scaling strategy?

Choosing the right TTS strategy depends on multiple factors. The study authors carried out a systematic investigation of how different policy models and PRMs affect the efficiency of TTS methods.

Their findings show that efficiency is largely dependent on the policy and PRM models. For example, for small policy models, search-based methods outperform best-of-N. However, for large policy models, best-of-N is more effective because the models have better reasoning capabilities and don’t need a reward model to verify every step of their reasoning.

Their findings also show that the right TTS strategy depends on the difficulty of the problem. For example, for small policy models with fewer than 7B parameters, best-of-N works better for easy problems, while beam search works better for harder problems. For policy models that have between 7B and 32B parameters, diverse tree search performs well for easy and medium problems, and beam search works best for hard problems. But for large policy models (72B parameters and more), best-of-N is the optimal method for all difficulty levels.

Why small models can beat large models

SLMs outperform large models at MATH and AIME-24 (source: arXiv)

Based on these findings, developers can create compute-optimal TTS strategies that take into account the policy model, PRM and problem difficulty to make the best use of compute budget to solve reasoning problems.

For example, the researchers found that a Llama-3.2-3B model with the compute-optimal TTS strategy outperforms the Llama-3.1-405B on MATH-500 and AIME24, two complicated math benchmarks. This shows that an SLM can outperform a model that is 135X larger when using the compute-optimal TTS strategy.

In other experiments, they found that a Qwen2.5 model with 500 million parameters can outperform GPT-4o with the right compute-optimal TTS strategy. Using the same strategy, the 1.5B distilled version of DeepSeek-R1 outperformed o1-preview and o1-mini on MATH-500 and AIME24.

When accounting for both training and inference compute budgets, the findings show that with compute-optimal scaling strategies, SLMs can outperform larger models with 100-1000X less FLOPS.

The researchers’ results show that compute-optimal TTS significantly enhances the reasoning capabilities of language models. However, as the policy model grows larger, the improvement of TTS gradually decreases. 

“This suggests that the effectiveness of TTS is directly related to the reasoning ability of the policy model,” the researchers write. “Specifically, for models with weak reasoning abilities, scaling test-time compute leads to a substantial improvement, whereas for models with strong reasoning abilities, the gain is limited.”

The study validates that SLMs can perform better than larger models when applying compute-optimal test-time scaling methods. While this study focuses on math benchmarks, the researchers plan to expand their study to other reasoning tasks such as coding and chemistry.

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

NRF 2026: HPE expands network, server products for retailers

The package also integrates information from HPE Aruba Networking User Experience Insight sensors and agents, which now include support for WiFi 7 networks. The combination can measure end-user activity and allow IT teams to baseline network performance, continuously test network health, track trends, and plan for device growth and AI-native

Read More »

Italy fines Cloudflare for refusing to block pirate sites

Italy’s communications authority AGCOM has fined Cloudflare €14.2 million for refusing to block pirate sites via its public DNS service 1.1.1.1, in accordance with the country’s controversial Piracy Shield law, reports Ars Technica. The law, which was introduced in 2024, requires network operators and DNS services to block websites and

Read More »

Global tech-sector layoffs surpass 244,000 in 2025

The RationalFX report summarizes the U.S. states with the highest tech layoffs in 2025: California: 73,499 jobs (43.08%) Washington: 42,221 jobs (24.74%) New York: 26,900 jobs (15.8%) Texas: 9,816 jobs (6%) Massachusetts: 3,477 jobs Intel leads workforce reductions Intel contributed the single largest number of layoffs in 2025, according to

Read More »

What enterprises think about quantum computing

And speaking of chips, our third point is that the future of quantum computing depends on improvement of the chips. There are already some heady advances claimed by chip startups, but the hype is going to outrun the reality for some time. Eventually, quantum computing will be, like digital computing,

Read More »

Scarborough FPU Arrives in Australia

Woodside Energy Group Ltd said Tuesday the Scarborough Energy Project’s floating production unit (FPU) had arrived at the project site offshore Western Australia. The project includes the development of the Scarborough gas field off the coast of Karratha, the construction of a second gas processing train for Pluto LNG with a capacity of five MMtpa and modifications to Pluto Train 1, according to Woodside. The FPU, built in China by Houston, Texas-headquartered McDermott International Ltd, will process gas from the field. Excluding train 1 modifications, Scarborough Energy was 91 percent complete at the end of the third quarter, according to Woodside’s quarterly report October 22, 2025. “Our focus now shifts to the hook-up and commissioning phase in preparation for production, and ultimately, first LNG cargo which is on track for the second half of this year”, Woodside acting chief executive Liz Westcott said in a statement on the company’s website Tuesday. Woodside called the FPU “one of the largest semisubmersible facilities ever constructed”. The vessel totals about 70,000 metric tons, according to Woodside. “It features advanced emissions-reduction systems and is designed to treat and compress gas for export through the trunkline”, the statement said. “It can also accommodate future tie-ins to support the development of nearby fields”. The Perth-based company expects the project to produce up to eight million metric tons a year of liquefied natural gas and supply 225 terajoules per day to the Western Australian market. Court Clearance Last year Australia’s Federal Court upheld regulatory approval of the environmental plan (EP) for Scarborough Energy, in a challenge put up by Doctors for the Environment (Australia) Inc (DEA). In a statement August 22, 2025, about the court win, Woodside noted the EP, approved by the National Offshore Petroleum Safety and Environmental Management Authority (NOPSEMA) in February 2025, represented the last

Read More »

Oil Jumps as Iran Tensions Escalate

Oil rose to the highest level since early December as unrest in Iran raises the specter of supply disruptions from OPEC’s fourth-biggest producer, with the Wall Street Journal reporting that President Donald Trump is leaning toward striking the country. West Texas Intermediate settled above $59 a barrel on Monday after jumping more than 6% over the past three sessions. Trump said Tehran had offered to enter negotiations with Washington over its yearslong nuclear program. But he is leaning toward authorizing military strikes against the Middle Eastern country over its treatment of protesters, the newspaper said, citing US officials familiar with the matter. Fresh political or military unrest in Iran could threaten disruption to the country’s roughly 3.3 million barrels-per-day oil production. Iran’s foreign minister repeated government claims that rioters and terrorists killed police and civilians, while footage was broadcast on state TV saying calm had been restored nationwide. “Traders must now balance odds of a smooth transition to regime change, odds of a messy transition potentially impacting oil production and exports, odds of a military confrontation or miscalculation, and odds the regime change may pivot towards a deal on US terms, which would bear the most negative implications for energy markets,” said Dan Ghali, a commodity strategist at TD Securities. The possibility of a disruption to Iran’s daily exports has tempered concerns over a global glut that caused a slump in prices and made investors increasingly bearish. The scale of risk has shown up clearest in options markets, where the skew toward bullish calls is the biggest for US crude futures since June and volatility is surging. The two weeks of protests in the country are the most significant challenge to Supreme Leader Ayatollah Ali Khamenei since a nationwide uprising in 2022. It follows a surge in oil prices during

Read More »

Democrats Put Alaska Senate Seat in Play

Former Alaska Representative Mary Peltola launched a campaign Monday to challenge Republican Dan Sullivan for one of Alaska’s Senate seats, putting the race in play for Democrats in November’s midterms.  “Systemic change is the only way to bring down grocery costs, save our fisheries, lower energy prices and build new housing Alaskans can afford,” Peltola said in a video announcing her candidacy. “No one from the Lower 48 is coming to save us, but I know this in my bones, there is no group of people more ready to save ourselves than Alaskans.”  Peltola held Alaska’s sole House seat until narrowly losing to a Republican in 2024. Democrats for months tried to recruit Peltola, who was also considering a bid for governor.  While President Donald Trump won Alaska by 13 points in 2024, the state’s down-ballot politics are often more issue-based than ideological. Peltola, an Alaskan Native, ran her previous campaigns on “fish, family and freedom” as a way to frame positions on issues like health care, taxes and abortion that are in line with the Democratic mainstream — as well as support for oil drilling and gun ownership that can be at odds with the rest of her party.  Early polling shows Sullivan vulnerable to a Peltola challenge, and Republicans across the country are preparing for midterms that have historically been difficult for the party in power. Democrats need to net four seats to take back the Senate majority.  Alaskans who get health insurance through the Affordable Care Act face some of the highest premiums in America after this month’s expiration of Biden-era subsidies. Sullivan, who had previously sought to repeal the ACA, voted last month to start debate on a Democratic proposal to extend expanded ACA subsidies for three years. The state, which already faces high prices for

Read More »

Trump Says He’s Inclined to Exclude XOM From Venezuela

(Update) January 12, 2026, 4:24 PM GMT: Article updated. Adds shares in the fifth paragraph. President Donald Trump signaled he’s leaning toward excluding Exxon Mobil Corp. from his push for US oil majors to rebuild Venezuela’s petroleum industry, saying he was displeased with the company’s response to his initiative. “I’d probably be inclined to keep Exxon out,” Trump told reporters late Sunday aboard the presidential plane on the way back to Washington from his Florida estate. “I didn’t like their response. They’re playing too cute.” Trump appeared to be referring to a White House meeting on Friday with almost 20 oil industry executives, where Exxon Chief Executive Officer Darren Woods expressed some of the strongest reservations and described Venezuela as “uninvestable.” The president’s latest comments also highlight the challenge of persuading the US oil industry to commit to an ambitious reconstruction of Venezuela’s once-mighty energy sector, which he announced within hours of the capture of former President Nicolás Maduro. Exxon shares fell as much as 1.7% Monday as crude futures were little changed.  Reviving the oil industry and undoing years of underinvestment and mismanagement would, by some estimates, require $100 billion and take a decade. Despite US moves over the past week to take full control of Venezuelan oil exports, many questions remain over how major investment on the ground could be guaranteed over such a protracted period in a country beset by corruption and insecurity. When asked Sunday which backstops or guarantees he had told oil companies he was willing to provide, Trump said: “Guarantees that they’re going to be safe, that there’s going to be no problem. And there won’t be.” Trump didn’t specify in what way he might seek to exclude Exxon. The company didn’t immediately respond to a request for comment outside of US office hours.

Read More »

North America Adds Almost 100 Rigs Week on Week

North America added 94 rigs week on week, according to Baker Hughes’ latest North America rotary rig count, which was published on January 9. Although the total U.S. rig count dropped by two week on week, the total Canada rig count increased by 96 during the same period, pushing the total North America rig count up to 741, comprising 544 rigs from the U.S. and 197 rigs from Canada, the count outlined. Of the total U.S. rig count of 544, 525 rigs are categorized as land rigs, 16 are categorized as offshore rigs, and three are categorized as inland water rigs. The total U.S. rig count is made up of 409 oil rigs, 124 gas rigs, and 11 miscellaneous rigs, according to Baker Hughes’ count, which revealed that the U.S. total comprises 475 horizontal rigs, 57 directional rigs, and 12 vertical rigs. Week on week, the U.S. land rig count dropped by two, and its offshore and inland water rig counts remained unchanged, Baker Hughes highlighted. The U.S. oil rig count dropped by three week on week, its gas rig count dropped by one, and its miscellaneous rig count increased by two week on week, the count showed. The U.S. horizontal rig count dropped by one, its vertical rig count dropped by two, and its directional rig count increased by one, week on week, the count revealed. A major state variances subcategory included in the rig count showed that, week on week, Louisiana dropped three rigs, and New Mexico, North Dakota, Texas, and Wyoming each dropped one rig. Utah added four rigs and Colorado added one rig week on week, the count highlighted. A major basin variances subcategory included in the rig count showed that, week on week, the Permian basin dropped three rigs, the Haynesville, Mississippian, and Williston basins

Read More »

Intensity, Rainbow Near FID on North Dakota Gas Pipeline

Intensity Infrastructure Partners LLC and power producer Rainbow Energy Center LLC have indicated they are nearing a positive final investment decision (FID) on a new pipeline project to bring Bakken natural gas to eastern North Dakota. “[T]he firm transportation commitments contained in executed precedent agreements are sufficient to underpin the decision to advance Phase I of their 36-inch natural gas pipeline in North Dakota, reflecting growing confidence in the region’s long-term power and industrial demand outlook”, the companies said in a joint statement. “This approach establishes a scalable, dispatchable power and gas delivery hub capable of adapting to evolving market conditions, supporting sustained data center growth, grid reliability needs and long-term industrial development across North Dakota”. “The system will provide reliable natural gas supply through multiple receipt points, including Northern Border Pipeline, WBI Energy’s existing transmission and storage network, and direct connections to six Bakken natural gas processing plants, creating a highly integrated supply platform from Bakken and Canadian production”, the online statement added. “The pipeline is designed to operate without compression fuel surcharges, reducing operational complexity while enhancing reliability and tariff transparency for shippers. “Uncommitted capacity on phase I supports incremental gas-fired generation along the planned pipeline corridor and at Coal Creek Station, leveraging existing power transmission infrastructure, a strategic geographic location and a proven operating platform. “The 36-inch pipeline enables future throughput increases without the need for duplicative greenfield infrastructure as demand continues to develop”. Rainbow chief executive Stacy Tschider said, “By leveraging established assets like Coal Creek and integrating directly with basin supply and interstate systems, this project is positioned to meet near-term needs while remaining expandable for the next generation of load growth”. The project would proceed in two phases. Phase 1 would build a 136-mile, 36-inch pipeline with a capacity of about 1.1 million dekatherms a day (Dthd). The phase 1 line would

Read More »

AI, edge, and security: Shaping the need for modern infrastructure management

The rapidly evolving IT landscape, driven by artificial intelligence (AI), edge computing, and rising security threats, presents unprecedented challenges in managing compute infrastructure. Traditional management tools struggle to provide the necessary scalability, visibility, and automation to keep up with business demand, leading to inefficiencies and increased business risk. Yet organizations need their IT departments to be strategic business partners that enable innovation and drive growth. To realize that goal, IT leaders should rethink the status quo and free up their teams’ time by adopting a unified approach to managing infrastructure that supports both traditional and AI workloads. It’s a strategy that enables companies to simplify IT operations and improve IT job satisfaction. 5 IT management challenges of the AI era Cisco recently commissioned Forrester Consulting to conduct a Total Economic Impact™ analysis of Cisco Intersight. This IT operations platform provides visibility, control, and automation capabilities for the Cisco Unified Computing System (Cisco UCS), including Cisco converged, hyperconverged, and AI-ready infrastructure solutions across data centers, colocation facilities, and edge environments. Intersight uses a unified policy-driven approach to infrastructure management and integrates with leading operating systems, storage providers, hypervisors, and third-party IT service management and security tools. The Forrester study first uncovered the issues IT groups are facing: Difficulty scaling: Manual, repetitive processes cause lengthy IT compute infrastructure build and deployment times. This challenge is particularly acute for organizations that need to evolve infrastructure to support traditional and AI workloads across data centers and distributed edge environments. Architectural specialization and AI workloads: AI is altering infrastructure requirements, Forrester found.  Companies design systems to support specific AI workloads — such as data preparation, model training, and inferencing — and each demands specialized compute, storage, and networking capabilities. Some require custom chip sets and purpose-built infrastructure, such as for edge computing and low-latency applications.

Read More »

DCF Poll: Analyzing AI Data Center Growth

@import url(‘https://fonts.googleapis.com/css2?family=Inter:[email protected]&display=swap’); a { color: var(–color-primary-main); } .ebm-page__main h1, .ebm-page__main h2, .ebm-page__main h3, .ebm-page__main h4, .ebm-page__main h5, .ebm-page__main h6 { font-family: Inter; } body { line-height: 150%; letter-spacing: 0.025em; font-family: Inter; } button, .ebm-button-wrapper { font-family: Inter; } .label-style { text-transform: uppercase; color: var(–color-grey); font-weight: 600; font-size: 0.75rem; } .caption-style { font-size: 0.75rem; opacity: .6; } #onetrust-pc-sdk [id*=btn-handler], #onetrust-pc-sdk [class*=btn-handler] { background-color: #1796c1 !important; border-color: #1796c1 !important; } #onetrust-policy a, #onetrust-pc-sdk a, #ot-pc-content a { color: #1796c1 !important; } #onetrust-consent-sdk #onetrust-pc-sdk .ot-active-menu { border-color: #1796c1 !important; } #onetrust-consent-sdk #onetrust-accept-btn-handler, #onetrust-banner-sdk #onetrust-reject-all-handler, #onetrust-consent-sdk #onetrust-pc-btn-handler.cookie-setting-link { background-color: #1796c1 !important; border-color: #1796c1 !important; } #onetrust-consent-sdk .onetrust-pc-btn-handler { color: #1796c1 !important; border-color: #1796c1 !important; } Coming out of 2025, AI data center development remains defined by momentum. But momentum is not the same as certainty. Behind the headlines, operators, investors, utilities, and policymakers are all testing the assumptions that carried projects forward over the past two years, from power availability and capital conditions to architecture choices and community response. Some will hold. Others may not. To open our 2026 industry polling, we’re taking a closer look at which pillars of AI data center growth are under the most pressure. What assumption about AI data center growth feels most fragile right now?

Read More »

JLL’s 2026 Global Data Center Outlook: Navigating the AI Supercycle, Power Scarcity and Structural Market Transformation

Sovereign AI and National Infrastructure Policy JLL frames artificial intelligence infrastructure as an emerging national strategic asset, with sovereign AI initiatives representing an estimated $8 billion in cumulative capital expenditure by 2030. While modest relative to hyperscale investment totals, this segment carries outsized strategic importance. Data localization mandates, evolving AI regulation, and national security considerations are increasingly driving governments to prioritize domestic compute capacity, often with pricing premiums reaching as high as 60%. Examples cited across Europe, the Middle East, North America, and Asia underscore a consistent pattern: digital sovereignty is no longer an abstract policy goal, but a concrete driver of data center siting, ownership structures, and financing models. In practice, sovereign AI initiatives are accelerating demand for locally controlled infrastructure, influencing where capital is deployed and how assets are underwritten. For developers and investors, this shift introduces a distinct set of considerations. Sovereign projects tend to favor jurisdictional alignment, long-term tenancy, and enhanced security requirements, while also benefiting from regulatory tailwinds and, in some cases, direct state involvement. As AI capabilities become more tightly linked to economic competitiveness and national resilience, policy-driven demand is likely to remain a durable (if specialized) component of global data center growth. Energy and Sustainability as the Central Constraint Energy availability emerges as the report’s dominant structural constraint. In many major markets, average grid interconnection timelines now extend beyond four years, effectively decoupling data center development schedules from traditional utility planning cycles. As a result, operators are increasingly pursuing alternative energy strategies to maintain project momentum, including: Behind-the-meter generation Expanded use of natural gas, particularly in the United States Private-wire renewable energy projects Battery energy storage systems (BESS) JLL points to declining battery costs, seen falling below $90 per kilowatt-hour in select deployments, as a meaningful enabler of grid flexibility, renewable firming, and

Read More »

SoftBank, DigitalBridge, and Stargate: The Next Phase of OpenAI’s Infrastructure Strategy

OpenAI framed Stargate as an AI infrastructure platform; a mechanism to secure long-duration, frontier-scale compute across both training and inference by coordinating capital, land, power, and supply chain with major partners. When OpenAI announced Stargate in January 2025, the headline commitment was explicit: an intention to invest up to $500 billion over four to five years to build new AI infrastructure in the U.S., with $100 billion targeted for near-term deployment. The strategic backdrop in 2025 was straightforward. OpenAI’s model roadmap—larger models, more agents, expanded multimodality, and rising enterprise workloads—was driving a compute curve increasingly difficult to satisfy through conventional cloud procurement alone. Stargate emerged as a form of “control plane” for: Capacity ownership and priority access, rather than simply renting GPUs. Power-first site selection, encompassing grid interconnects, generation, water access, and permitting. A broader partner ecosystem beyond Microsoft, while still maintaining a working relationship with Microsoft for cloud capacity where appropriate. 2025 Progress: From Launch to Portfolio Buildout January 2025: Stargate Launches as a National-Scale Initiative OpenAI publicly launched Project Stargate on Jan. 21, 2025, positioning it as a national-scale AI infrastructure initiative. At this early stage, the work was less about construction and more about establishing governance, aligning partners, and shaping a public narrative in which compute was framed as “industrial policy meets real estate meets energy,” rather than simply an exercise in buying more GPUs. July 2025: Oracle Partnership Anchors a 4.5-GW Capacity Step On July 22, 2025, OpenAI announced that Stargate had advanced through a partnership with Oracle to develop 4.5 gigawatts of additional U.S. data center capacity. The scale of the commitment marked a clear transition from conceptual ambition to site- and megawatt-level planning. A figure of this magnitude reshaped the narrative. At 4.5 GW, Stargate forced alignment across transformers, transmission upgrades, switchgear, long-lead cooling

Read More »

Lenovo unveils purpose-built AI inferencing servers

There is also the Lenovo ThinkSystem SR650i, which offers high-density GPU computing power for faster AI inference and is intended for easy installation in existing data centers to work with existing systems. Finally, there is the Lenovo ThinkEdge SE455i for smaller, edge locations such as retail outlets, telecom sites, and industrial facilities. Its compact design allows for low-latency AI inference close to where data is generated and is rugged enough to operate in temperatures ranging from -5°C to 55°C. All of the servers include Lenovo’s Neptune air- and liquid-cooling technology and are available through the TruScale pay-as-you-go pricing model. In addition to the new hardware, Lenovo introduced new AI Advisory Services with AI Factory Integration. This service gives access to professionals for identifying, deploying, and managing best-fit AI Inferencing servers. It also launched Premier Support Plus, a service that gives professional assistance in data center management, freeing up IT resources for more important projects.

Read More »

Samsung warns of memory shortages driving industry-wide price surge in 2026

SK Hynix reported during its October earnings call that its HBM, DRAM, and NAND capacity is “essentially sold out” for 2026, while Micron recently exited the consumer memory market entirely to focus on enterprise and AI customers. Enterprise hardware costs surge The supply constraints have translated directly into sharp price increases across enterprise hardware. Samsung raised prices for 32GB DDR5 modules to $239 from $149 in September, a 60% increase, while contract pricing for DDR5 has surged more than 100%, reaching $19.50 per unit compared to around $7 earlier in 2025. DRAM prices have already risen approximately 50% year to date and are expected to climb another 30% in Q4 2025, followed by an additional 20% in early 2026, according to Counterpoint Research. The firm projected that DDR5 64GB RDIMM modules, widely used in enterprise data centers, could cost twice as much by the end of 2026 as they did in early 2025. Gartner forecast DRAM prices to increase by 47% in 2026 due to significant undersupply in both traditional and legacy DRAM markets, Chauhan said. Procurement leverage shifts to hyperscalers The pricing pressures and supply constraints are reshaping the power dynamics in enterprise procurement. For enterprise procurement, supplier size no longer guarantees stability. “As supply becomes more contested in 2026, procurement leverage will hinge less on volume and more on strategic alignment,” Rawat said. Hyperscale cloud providers secure supply through long-term commitments, capacity reservations, and direct fab investments, obtaining lower costs and assured availability. Mid-market firms rely on shorter contracts and spot sourcing, competing for residual capacity after large buyers claim priority supply.

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »