Stay Ahead, Stay ONMINE

When AI reasoning goes wrong: Microsoft Research shows more tokens can mean more problems

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Large language models (LLMs) are increasingly capable of complex reasoning through “inference-time scaling,” a set of techniques that allocate more computational resources during inference to generate answers. However, a new study from Microsoft Research reveals that […]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Large language models (LLMs) are increasingly capable of complex reasoning through “inference-time scaling,” a set of techniques that allocate more computational resources during inference to generate answers. However, a new study from Microsoft Research reveals that the effectiveness of these scaling methods isn’t universal. Performance boosts vary significantly across different models, tasks and problem complexities.

The core finding is that simply throwing more compute at a problem during inference doesn’t guarantee better or more efficient results. The findings can help enterprises better understand cost volatility and model reliability as they look to integrate advanced AI reasoning into their applications.

Putting scaling methods to the test

The Microsoft Research team conducted an extensive empirical analysis across nine state-of-the-art foundation models. This included both “conventional” models like GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Pro and Llama 3.1 405B, as well as models specifically fine-tuned for enhanced reasoning through inference-time scaling. This included OpenAI’s o1 and o3-mini, Anthropic’s Claude 3.7 Sonnet, Google’s Gemini 2 Flash Thinking, and DeepSeek R1.

They evaluated these models using three distinct inference-time scaling approaches:

  1. Standard Chain-of-Thought (CoT): The basic method where the model is prompted to answer step-by-step.
  2. Parallel Scaling: the model generates multiple independent answers for the same question and uses an aggregator (like majority vote or selecting the best-scoring answer) to arrive at a final result.
  3. Sequential Scaling: The model iteratively generates an answer and uses feedback from a critic (potentially from the model itself) to refine the answer in subsequent attempts.

These approaches were tested on eight challenging benchmark datasets covering a wide range of tasks that benefit from step-by-step problem-solving: math and STEM reasoning (AIME, Omni-MATH, GPQA), calendar planning (BA-Calendar), NP-hard problems (3SAT, TSP), navigation (Maze) and spatial reasoning (SpatialMap).

Several benchmarks included problems with varying difficulty levels, allowing for a more nuanced understanding of how scaling behaves as problems become harder.

“The availability of difficulty tags for Omni-MATH, TSP, 3SAT, and BA-Calendar enables us to analyze how accuracy and token usage scale with difficulty in inference-time scaling, which is a perspective that is still underexplored,” the researchers wrote in the paper detailing their findings.

The researchers evaluated the Pareto frontier of LLM reasoning by analyzing both accuracy and the computational cost (i.e., the number of tokens generated). This helps identify how efficiently models achieve their results. 

Inference-time scaling pareto
Inference-time scaling Pareto frontier Credit: arXiv

They also introduced the “conventional-to-reasoning gap” measure, which compares the best possible performance of a conventional model (using an ideal “best-of-N” selection) against the average performance of a reasoning model, estimating the potential gains achievable through better training or verification techniques.

More compute isn’t always the answer

The study provided several crucial insights that challenge common assumptions about inference-time scaling:

Benefits vary significantly: While models tuned for reasoning generally outperform conventional ones on these tasks, the degree of improvement varies greatly depending on the specific domain and task. Gains often diminish as problem complexity increases. For instance, performance improvements seen on math problems didn’t always translate equally to scientific reasoning or planning tasks.

Token inefficiency is rife: The researchers observed high variability in token consumption, even between models achieving similar accuracy. For example, on the AIME 2025 math benchmark, DeepSeek-R1 used over five times more tokens than Claude 3.7 Sonnet for roughly comparable average accuracy. 

More tokens do not lead to higher accuracy: Contrary to the intuitive idea that longer reasoning chains mean better reasoning, the study found this isn’t always true. “Surprisingly, we also observe that longer generations relative to the same model can sometimes be an indicator of models struggling, rather than improved reflection,” the paper states. “Similarly, when comparing different reasoning models, higher token usage is not always associated with better accuracy. These findings motivate the need for more purposeful and cost-effective scaling approaches.”

Cost nondeterminism: Perhaps most concerning for enterprise users, repeated queries to the same model for the same problem can result in highly variable token usage. This means the cost of running a query can fluctuate significantly, even when the model consistently provides the correct answer. 

variance in model outputs
Variance in response length (spikes show smaller variance) Credit: arXiv

The potential in verification mechanisms: Scaling performance consistently improved across all models and benchmarks when simulated with a “perfect verifier” (using the best-of-N results). 

Conventional models sometimes match reasoning models: By significantly increasing inference calls (up to 50x more in some experiments), conventional models like GPT-4o could sometimes approach the performance levels of dedicated reasoning models, particularly on less complex tasks. However, these gains diminished rapidly in highly complex settings, indicating that brute-force scaling has its limits.

GPT-4o inference-time scaling
On some tasks, the accuracy of GPT-4o continues to improve with parallel and sequential scaling. Credit: arXiv

Implications for the enterprise

These findings carry significant weight for developers and enterprise adopters of LLMs. The issue of “cost nondeterminism” is particularly stark and makes budgeting difficult. As the researchers point out, “Ideally, developers and users would prefer models for which the standard deviation on token usage per instance is low for cost predictability.”

“The profiling we do in [the study] could be useful for developers as a tool to pick which models are less volatile for the same prompt or for different prompts,” Besmira Nushi, senior principal research manager at Microsoft Research, told VentureBeat. “Ideally, one would want to pick a model that has low standard deviation for correct inputs.” 

Models that peak blue to the left consistently generate the same number of tokens at the given task Credit: arXiv

The study also provides good insights into the correlation between a model’s accuracy and response length. For example, the following diagram shows that math queries above ~11,000 token length have a very slim chance of being correct, and those generations should either be stopped at that point or restarted with some sequential feedback. However, Nushi points out that models allowing these post hoc mitigations also have a cleaner separation between correct and incorrect samples.

“Ultimately, it is also the responsibility of model builders to think about reducing accuracy and cost non-determinism, and we expect a lot of this to happen as the methods get more mature,” Nushi said. “Alongside cost nondeterminism, accuracy nondeterminism also applies.”

Another important finding is the consistent performance boost from perfect verifiers, which highlights a critical area for future work: building robust and broadly applicable verification mechanisms. 

“The availability of stronger verifiers can have different types of impact,” Nushi said, such as improving foundational training methods for reasoning. “If used efficiently, these can also shorten the reasoning traces.”

Strong verifiers can also become a central part of enterprise agentic AI solutions. Many enterprise stakeholders already have such verifiers in place, which may need to be repurposed for more agentic solutions, such as SAT solvers, logistic validity checkers, etc. 

“The questions for the future are how such existing techniques can be combined with AI-driven interfaces and what is the language that connects the two,” Nushi said. “The necessity of connecting the two comes from the fact that users will not always formulate their queries in a formal way, they will want to use a natural language interface and expect the solutions in a similar format or in a final action (e.g. propose a meeting invite).”

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Broadcom’s licensing clampdown: Subscription-less VMware users face legal ultimatum

Perhaps most concerning for enterprises, some organizations have reported receiving these legal threats even after completely migrating away from VMware technologies. One user on Reddit described receiving a cease-and-desist letter despite having already transitioned entirely to Proxmox, raising questions about Broadcom’s tracking capabilities and enforcement criteria. The notices universally include

Read More »

Quantum computing gets an error-correction boost from AI innovation

The RIKEN team, including Nori, Clemens Gneiting, and Yexiong Zeng, developed a deep learning method to optimize GKP states, making them easier to produce while maintaining robust error correction. “Our AI-driven method fine-tunes the structure of GKP states, striking an optimal balance between resource efficiency and error resilience,” said Zeng

Read More »

Nutanix expands beyond HCI

The Pure Storage integration will also be supported within Cisco’s FlashStack offering, creating a “FlashStack with Nutanix” solution with storage provided by Pure, networking capabilities as well as UCS servers from Cisco, and then the common Nutanix Cloud Platform. Cloud Native AOS: Breaking free from hypervisors Another sharp departure from

Read More »

Chevron Reveals Q1 Net Oil Equivalent Production Level

In its first quarter 2025 results statement, which was published recently, Chevron Corporation revealed that its net oil equivalent production averaged 3.353 million barrels of oil equivalent per day in the first quarter of this year. That figure was up slightly from the previous quarter and the corresponding quarter of last year, the report highlighted. Chevron produced 3.350 million barrels of oil equivalent per day in the fourth quarter of 2024 and 3.346 million barrels of oil equivalent per day in the first quarter of last year, the report outlined. According to the report, Chevron’s U.S. net oil equivalent production averaged 1.636 million barrels of oil equivalent per day in the first quarter of 2025, 1.646 million barrels per day in the fourth quarter of 2024, and 1.573 million barrels per day in the first quarter of 2024. The company’s international net oil equivalent output averaged 1.717 million barrels of oil equivalent per day in the first quarter of this year, 1.704 million barrels per day in the fourth quarter of last year, and 1.773 million barrels per day in the first quarter of 2024, the statement outlined. “Worldwide production was relatively flat from a year ago as the impacts of asset sales were mostly offset by growth at Tengizchevroil (20 percent), in the Permian Basin (12 percent), and in the Gulf of America (7 percent),” Chevron said in its statement. “U.S. net oil-equivalent production was up 63,000 barrels per day from a year earlier primarily due to higher production in the Permian Basin and Gulf of America, partly offset by lower production in the Rockies,” the company added. Looking at international output in the statement, Chevron said “net oil-equivalent production during the quarter was down 56,000 barrels per day from a year earlier primarily due to asset sales in

Read More »

EVOL: Bad news comes in threes

This week, there has been a whole lot of doom and gloom in the UK energy market. Harbour Energy announced 250 job cuts in Aberdeen, Ørsted discontinued its Hornsea 4 offshore wind project, and Drax opted not to bid for its Cruachan II in the Cap and Floor mechanism. Up first, news reporter Mat Perry discusses the UK’s largest oil and gas operator slashing 25% of its UK onshore headcount as it downgrades spending in the country. Aberdeen Features Lead Ryan Duff gives his two cents on the state of job losses in the north-east of Scotland and the wider North Sea. Brace yourself for explicit language. Next up, Mat and renewables reporter Michael Behr discuss why Ørsted has pulled the hand brake on a major offshore wind project. Pumped storage hydro was also mentioned, as Cruachan II gives the cap and floor scheme a swerve after battery firms argued that the mechanism was biased towards such projects. And finally, Michael chats with Guy Newey, chief executive officer at the Energy Systems Catapult, about the ever-divisive zonal pricing debate and how it could help drive innovation in the energy sector. Listen to Energy Voice Out Loud on your podcast platform of choice.

Read More »

Delayed Blenheim Palace solar project seeks £800m finance package

Planning for the UK’s largest solar farm has hit minor delays, as local opposition to the project mounts despite the huge biodiversity net gain touted by developers. German developer Photovolt Development Partners (PVDP) is seeking nearly £1 billion in debt and equity to build what would be the UK’s largest solar farm on land owned by the Blenheim estate, near Blenheim Palace, bordering Jeremy Clarkson’s farm. The project is due to enter a critical stage of planning next week, a month after originally expected. “The final decision to approve the project will be made by energy security and net zero secretary, Ed Miliband,” a spokesperson said. “The examination period, run by the planning inspectorate and the next stage in the process for Botley West, is due to commence on 13 May 2025. The decision on when the examination period begins is entirely down to the planning inspectorate.” The Botley West solar farm could meet at least 840 megawatts (MW) of the UK’s energy needs and play a huge role in decarbonising the local energy supply in Oxfordshire, according to developers. The Oxfordshire region is one of the country’s dirtiest fossil fuel systems, supplied mainly by the Didcot gas power station since the closure of an adjacent coal plant, but Botley West could change that. The public examination period for the project, which is expected to last for about six months, was expected to start in the middle of April – but has since been pushed back to May. Two specific hearings are expected to take place where heritage issues would be addressed and local venues could hold examinations, including open floor hearings. “The applicant themselves submitted two change requests on 28 March 2025, which the examining authority has now considered and determined to incorporate into the examination,” the planning inspectorate

Read More »

Suncor Profit Up on Record Refining Sales

Suncor Energy Inc. has reported CAD 1.69 billion ($1.21 billion) in net profit for the first quarter (Q1) of 2025, up from CAD 1.61 billion for the same three-month period last year as it achieved its highest-ever upstream output and refined product sales for the January-March period. Net income per basic share was CAD 1.36, compared to CAD 1.25 for Q1 2024. The Calgary, Canada-based oil sands developer, which also explores for and produces oil and gas offshore, produced 853,200 barrels per day (bpd) in Q1 2025, up from 835,300 bpd in Q1 2024. Oil sands production averaged 790,900 bpd, compared to 785,000 bpd in Q1 2024. Offshore production was 62,300 bpd, up from 50,300 bpd in Q1 2204, Suncor said in its quarterly report. Suncor said the White Rose field in the Jeanne d’Arc Basin off the coast of Newfoundland and Labrador province has resumed production. That partly led to the year-on-year production increase as the field had no volume contribution in Q1 2024. However, even as upstream production grew, operating income adjusted for nonrecurring items fell from CAD 1.82 billion for Q1 2024 to CAD 1.63 billion for Q1 2025 due to upstream sales volumes decreasing from 847,400 bpd to 828,400 bpd. Suncor attributed the sales volume decline to “a build in inventory as production remained strong”. That drop was partially offset by an increase in refined product sales from 581,000 bpd in Q1 2024 to 604,900 bpd in Q1 2025 as processed oil rose from 455,300 bpd to 482,700 bpd, also a Q1 record. On the price side, “Higher Oil Sands price realizations, which benefited from narrower differentials compared to the prior year quarter, were mostly offset by lower downstream benchmark crack spreads”, the report stated. “The Canadian dollar also weakened against the U.S. dollar in the

Read More »

CIP to Invest $500 Million in BKV CCUS Projects

BKV Corp. and Copenhagen Infrastructure Partners (CIP) have agreed to form a joint venture (JV) on carbon capture, utilization and storage (CCUS) in the United States. Denver, Colorado-based BKV, a producer of natural gas and power, will contribute its stakes in existing and future CCUS projects to the JV. Meanwhile energy transition-focused Danish investor CIP will invest $500 million in the JV for a stake of 49 percent. “BKV has contributed to the JV its ownership of the Barnett Zero and Eagle Ford projects, and has committed to future contributions of CCUS projects, related assets and/or cash, in exchange for a 51 percent interest in the JV”, a joint statement said. The Barnett Zero Project in the Texan city of Bridgeport has sequestered over 200,000 metric tons of carbon dioxide (CO2) equivalent emissions since it began operation November 2023. Serving natural gas processing plants, Barnett Zero has a sequestration rate of about 185,000 metric tons a year, according to BKV. It co-owns the project with EnLink Midstream LLC. Meanwhile the proposed Eagle Ford Project, to be built at a natural gas processing plant in South Texas, is expected to reach full operation next year. BKV, which has partnered with a midstream company for the project, anticipates the sequestration rate to be around 90,000 metric tons per annum of CO2 equivalent. Approved by BKV December 2024, the project has received approval from the Texas Railroad Commission for its Class II injection well. A monitoring, reporting and verification plan has also been submitted to the United States Environmental Protection Agency for approval, according to BKV. “The JV will leverage BKV’s standing as an early leader in developing CCUS projects while benefiting from CIP’s significant experience in developing low-carbon infrastructure projects”, the statement said. “BKV and CIP expect to identify investment-ready projects for development

Read More »

USA Crude Oil Inventories Drop Week on Week

U.S. commercial crude oil inventories, excluding those in the Strategic Petroleum Reserve (SPR), decreased by two million barrels from the week ending April 25 to the week ending May 2, the U.S. Energy Information Administration (EIA) highlighted in its latest weekly petroleum status report. That EIA report was released on May 7 and included data for the week ending May 2. It showed that crude oil stocks, not including the SPR, stood at 438.4 million barrels on May 2, 440.4 million barrels on April 25, and 459.5 million barrels on May 3, 2024. Crude oil in the SPR stood at 399.1 million barrels on May 2, 398.5 million barrels on April 25, and 367.2 million barrels on May 3, 2024, the report outlined. Total petroleum stocks – including crude oil, total motor gasoline, fuel ethanol, kerosene type jet fuel, distillate fuel oil, residual fuel oil, propane/propylene, and other oils – stood at 1.612 billion barrels on May 2, the report showed. Total petroleum stocks were up 1.7 million barrels week on week and up 5.7 million barrels year on year, the report revealed. “At 438.4 million barrels, U.S. crude oil inventories are about seven percent below the five year average for this time of year,” the EIA said in its latest weekly petroleum status report. “Total motor gasoline inventories increased by 0.2 million barrels from last week and are about three percent below the five year average for this time of year. Finished gasoline inventories increased and blending components inventories decreased last week,” it added. “Distillate fuel inventories decreased by 1.1 million barrels last week and are about 13 percent below the five year average for this time of year. Propane/propylene inventories increased by one million barrels from last week and are 11 percent below the five year average for

Read More »

Tech CEOs warn Senate: Outdated US power grid threatens AI ambitions

The implications are clear: without dramatic improvements to the US energy infrastructure, the nation’s AI ambitions could be significantly constrained by simple physical limitations – the inability to power the massive computing clusters necessary for advanced AI development and deployment. Streamlining permitting processes The tech executives have offered specific recommendations to address these challenges, with several focusing on the need to dramatically accelerate permitting processes for both energy generation and the transmission infrastructure needed to deliver that power to AI facilities, the report added. Intrator specifically called for efforts “to streamline the permitting process to enable the addition of new sources of generation and the transmission infrastructure to deliver it,” noting that current regulatory frameworks were not designed with the urgent timelines of the AI race in mind. This acceleration would help technology companies build and power the massive data centers needed for AI training and inference, which require enormous amounts of electricity delivered reliably and consistently. Beyond the cloud: bringing AI to everyday devices While much of the testimony focused on large-scale infrastructure needs, AMD CEO Lisa Su emphasized that true AI leadership requires “rapidly building data centers at scale and powering them with reliable, affordable, and clean energy sources.” Su also highlighted the importance of democratizing access to AI technologies: “Moving faster also means moving AI beyond the cloud. To ensure every American benefits, AI must be built into the devices we use every day and made as accessible and dependable as electricity.”

Read More »

Networking errors pose threat to data center reliability

Still, IT and networking issues increased in 2024, according to Uptime Institute. The analysis attributed the rise in outages due to increased IT and network complexity, specifically, change management and misconfigurations. “Particularly with distributed services, cloud services, we find that cascading failures often occur when networking equipment is replicated across an entire network,” Lawrence explained. “Sometimes the failure of one forces traffic to move in one direction, overloading capacity at another data center.” The most common causes of major network-related outages were cited as: Configuration/change management failure: 50% Third-party network provider failure: 34% Hardware failure: 31% Firmware/software error: 26% Line breakages: 17% Malicious cyberattack: 17% Network overload/congestion failure: 13% Corrupted firewall/routing tables issues: 8% Weather-related incident: 7% Configuration/change management issues also attributed for 62% of the most common causes of major IT system-/software-related outages. Change-related disruptions consistently are responsible for software-related outages. Human error continues to be one of the “most persistent challenges in data center operations,” according to Uptime’s analysis. The report found that the biggest cause of these failures is data center staff failing to follow established procedures, which has increased by about 10 percentage points compared to 2023. “These are things that were 100% under our control. I mean, we can’t control when the UPS module fails because it was either poorly manufactured, it had a flaw, or something else. This is 100% under our control,” Brown said. The most common causes of major human error-related outages were reported as:

Read More »

Liquid cooling technologies: reducing data center environmental impact

“Highly optimized cold-plate or one-phase immersion cooling technologies can perform on par with two-phase immersion, making all three liquid-cooling technologies desirable options,” the researchers wrote. Factors to consider There are numerous factors to consider when adopting liquid cooling technologies, according to Microsoft’s researchers. First, they advise performing a full environmental, health, and safety analysis, and end-to-end life cycle impact analysis. “Analyzing the full data center ecosystem to include systems interactions across software, chip, server, rack, tank, and cooling fluids allows decision makers to understand where savings in environmental impacts can be made,” they wrote. It is also important to engage with fluid vendors and regulators early, to understand chemical composition, disposal methods, and compliance risks. And associated socioeconomic, community, and business impacts are equally critical to assess. More specific environmental considerations include ozone depletion and global warming potential; the researchers emphasized that operators should only use fluids with low to zero ozone depletion potential (ODP) values, and not hydrofluorocarbons or carbon dioxide. It is also critical to analyze a fluid’s viscosity (thickness or stickiness), flammability, and overall volatility. And operators should only use fluids with minimal bioaccumulation (the buildup of chemicals in lifeforms, typically in fish) and terrestrial and aquatic toxicity. Finally, once up and running, data center operators should monitor server lifespan and failure rates, tracking performance uptime and adjusting IT refresh rates accordingly.

Read More »

Cisco unveils prototype quantum networking chip

Clock synchronization allows for coordinated time-dependent communications between end points that might be cloud databases or in large global databases that could be sitting across the country or across the world, he said. “We saw recently when we were visiting Lawrence Berkeley Labs where they have all of these data sources such as radio telescopes, optical telescopes, satellites, the James Webb platform. All of these end points are taking snapshots of a piece of space, and they need to synchronize those snapshots to the picosecond level, because you want to detect things like meteorites, something that is moving faster than the rotational speed of planet Earth. So the only way you can detect that quickly is if you synchronize these snapshots at the picosecond level,” Pandey said. For security use cases, the chip can ensure that if an eavesdropper tries to intercept the quantum signals carrying the key, they will likely disturb the state of the qubits, and this disturbance can be detected by the legitimate communicating parties and the link will be dropped, protecting the sender’s data. This feature is typically implemented in a Quantum Key Distribution system. Location information can serve as a critical credential for systems to authenticate control access, Pandey said. The prototype quantum entanglement chip is just part of the research Cisco is doing to accelerate practical quantum computing and the development of future quantum data centers.  The quantum data center that Cisco envisions would have the capability to execute numerous quantum circuits, feature dynamic network interconnection, and utilize various entanglement generation protocols. The idea is to build a network connecting a large number of smaller processors in a controlled environment, the data center warehouse, and provide them as a service to a larger user base, according to Cisco.  The challenges for quantum data center network fabric

Read More »

Zyxel launches 100GbE switch for enterprise networks

Port specifications include: 48 SFP28 ports supporting dual-rate 10GbE/25GbE connectivity 8 QSFP28 ports supporting 100GbE connections Console port for direct management access Layer 3 routing capabilities include static routing with support for access control lists (ACLs) and VLAN segmentation. The switch implements IEEE 802.1Q VLAN tagging, port isolation, and port mirroring for traffic analysis. For link aggregation, the switch supports IEEE 802.3ad for increased throughput and redundancy between switches or servers. Target applications and use cases The CX4800-56F targets multiple deployment scenarios where high-capacity backbone connectivity and flexible port configurations are required. “This will be for service providers initially or large deployments where they need a high capacity backbone to deliver a primarily 10G access layer to the end point,” explains Nguyen. “Now with Wi-Fi 7, more 10G/25G capable POE switches are being powered up and need interconnectivity without the bottleneck. We see this for data centers, campus, MDU (Multi-Dwelling Unit) buildings or community deployments.” Management is handled through Zyxel’s NebulaFlex Pro technology, which supports both standalone configuration and cloud management via the Nebula Control Center (NCC). The switch includes a one-year professional pack license providing IGMP technology and network analytics features. The SFP28 ports maintain backward compatibility between 10G and 25G standards, enabling phased migration paths for organizations transitioning between these speeds.

Read More »

Engineers rush to master new skills for AI-driven data centers

According to the Uptime Institute survey, 57% of data centers are increasing salary spending. Data center job roles that saw the highest increases were in operations management – 49% of data center operators said they saw highest increases in this category – followed by junior and mid-level operations staff at 45%, and senior management and strategy at 35%. Other job categories that saw salary growth were electrical, at 32% and mechanical, at 23%. Organizations are also paying premiums on top of salaries for particular skills and certifications. Foote Partners tracks pay premiums for more than 1,300 certified and non-certified skills for IT jobs in general. The company doesn’t segment the data based on whether the jobs themselves are data center jobs, but it does track 60 skills and certifications related to data center management, including skills such as storage area networking, LAN, and AIOps, and 24 data center-related certificates from Cisco, Juniper, VMware and other organizations. “Five of the eight data center-related skills recording market value gains in cash pay premiums in the last twelve months are all AI-related skills,” says David Foote, chief analyst at Foote Partners. “In fact, they are all among the highest-paying skills for all 723 non-certified skills we report.” These skills bring in 16% to 22% of base salary, he says. AIOps, for example, saw an 11% increase in market value over the past year, now bringing in a premium of 20% over base salary, according to Foote data. MLOps now brings in a 22% premium. “Again, these AI skills have many uses of which the data center is only one,” Foote adds. The percentage increase in the specific subset of these skills in data centers jobs may vary. The Uptime Institute survey suggests that the higher pay is motivating workers to stay in the

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »