Meet the new king of AI coding: Google’s Gemini 2.5 Pro I/O Edition dethrones Claude 3.7 Sonnet

Stay Ahead, Stay ONMINE

Meet the new king of AI coding: Google’s Gemini 2.5 Pro I/O Edition dethrones Claude 3.7 Sonnet

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More There’s a new king on the throne of AI coding models: Today, Google’s DeepMind AI research unit unveiled Gemini 2.5 Pro “I/O” edition, a new version of its hit Gemini 2.5 Pro multimodal large language model […]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

There’s a new king on the throne of AI coding models: Today, Google’s DeepMind AI research unit unveiled Gemini 2.5 Pro “I/O” edition, a new version of its hit Gemini 2.5 Pro multimodal large language model (LLM) released back in March that DeepMind CEO Demis Hassabis said on X is “the best coding model we’ve ever built!”

Indeed, the initial benchmarks released by the company indicate Google has taken the lead — for the first time since the generative AI race began in earnest with the late 2022 launch of ChatGPT — above all other models on at least one important coding benchmark.

The new version, labeled “gemini-2.5-pro-preview-05-06,” replaces the previous 03-25 release and is now available for indie developers in Google AI Studio and for enterprises in the Vertex AI cloud platform, as well as to individual users in the Gemini app. Google’s blog post said it also powers the Gemini mobile app’s Canvas and other features.

The new version powers feature development in apps like Gemini 95, where the model helps match visual styles across components automatically. It also enables workflows like converting YouTube videos into full-featured learning applications and crafting highly styled components—such as responsive video players or animated dictation UIs—with little to no manual CSS editing.

It’s a proprietary model, meaning enterprises will have to pay Google to use it and access it only through Google’s web services. However, it doesn’t alter pricing or rate limits; current users of Gemini 2.5 Pro will be automatically routed to the updated model which costs $1.25/$10 per million tokens in/out (for context lengths of 200,000 tokens) compared to Claude 3.7 Sonnet’s $3/$15.

The company frames this move — ahead of Google’s annual I/O (input/output) developer conference later this month in Mountain View and online, May 20-21 — as a response to strong community feedback around Gemini’s practical utility in real-world code generation and interface design.

Logan Kilpatrick, Senior Product Manager for Gemini API and Google AI Studio, confirmed in a developer blog post that the update also addresses key developer feedback around function calling, with improvements in error reduction and trigger reliability.

Top scores from human raters at generating web apps

On WebDev Arena Leaderboard, a third-party metric that ranks models by human preference based on their ability to generate visually appealing and functional web apps, Gemini 2.5 Pro Preview (05-06) has now overtaken Anthropic’s Claude 3.7 Sonnet at the number one spot.

The new version scored 1499.95 on the leaderboard, placing it well ahead of Sonnet 3.7’s 1377.10. The previous Gemini 2.5 Pro (03-25) model held third place with a score of 1278.96, meaning the I/O edition represents a 221-point jump.

As noted by the AI power user “Lisan al Gaib” on X, not even OpenAI’s GPT-4o (“o3”) was able to displace Sonnet 3.7, highlighting the significance of Gemini’s advancement.

Gemini’s performance boost reflects improved reliability, aesthetics, and usability in its outputs.

Already winning rave reviews

Several developers and platform leaders have highlighted the model’s improved reliability and application in production scenarios.

Cognition’s Silas Alberti noted that Gemini 2.5 Pro was the first model to successfully complete a complex refactoring of a backend routing system, demonstrating the kind of decision-making one would expect from a senior developer.

Michael Truell, CEO of the AI coding tool Cursor, said internal testing shows a marked decrease in tool call failures, a previously noted issue. He expects users to find the latest version significantly more effective in hands-on environments. Cursor has already integrated Gemini 2.5 Pro into its own code agent, reflecting how developers are using the model as a key component in more intelligent developer workflows.

Michele Catasta, President of Replit, described Gemini 2.5 Pro as the best frontier model for balancing capability with latency. His comments suggest that Replit is considering integration of the model into its own tools, especially for tasks where high responsiveness and reliability are crucial.

Similarly, AI educator and BlueShell private AI chatbot founder Paul Couvert noted on X that “Its code and UI generation capabilities are impressive.’”

And as Pietro Schirano, CEO of the AI art tool EverArt, noted on X, the new Gemini 2.5 Pro I/O edition was able to generate an interactive simulation of the “1 gorilla vs. 100 men” meme that’s been circulating on social media lately from a single prompt.

Showing off another interactive Tetris-style puzzle game with working sound effects reportedly created in less than a minute, X user “RameshR” (@rezmeram) wrote that “the casual game industry is dead!!”

These endorsements add weight to DeepMind’s claims of practical improvements and may encourage broader adoption across developer platforms.

Full apps and programs from one text prompt

One of the standout features of the update is its ability to build full, interactive web apps or simulations from a single prompt.

This aligns with DeepMind’s vision of simplifying the prototyping and development process.

Demonstrations within the Gemini app showcase how users can transform visual patterns or thematic prompts into usable code, lowering the barrier to entry for design-oriented developers and teams experimenting with new ideas.

Although the architecture and under-the-hood changes of Gemini 2.5 Pro have not been detailed publicly, the emphasis remains on enabling faster, more intuitive development experiences.

By leaning into its strengths in code generation and multimodal inputs, Gemini 2.5 Pro is positioned less as a research novelty and more as a practical tool for real-world coding challenges. The early release reflects a clear intention from Google DeepMind to meet developer demand and maintain momentum ahead of its major conference announcements.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

IBM wrangles AI agents to work across complex enterprise environments

In addition, the new Agent Catalog in watsonx Orchestrate can simplify access to more than 150 agents and pre-built tools from IBM and its partners, which include Box, MasterCard, Oracle, Salesforce, ServiceNow, and Symplistic.ai. IBM is also rolling out an agent builder tool in June that will let customers build their own agents in less

ServiceNow launches AI agent command center, communication backbone

By having a governing platform in place, enterprises will be able to achieve better results with their AI agents and AI initiatives, industry watchers say. “By 2028, enterprises using AI governance platforms will achieve 30% higher customer trust ratings and 25% better regulatory compliance scores than their competitors,” according to

Network and security vulnerabilities linked to 60% of zero-day cyberattacks

According to Casey Charrier, senior analyst at GTIG: “Zero-day exploitation continues to grow at a slow but steady pace. However, we have also started to see vendors’ work to mitigate zero-day exploitation begin to pay off. For example, we have seen fewer instances of zero-day exploitation targeting products that have

India takes first big step in Quantum Computing supremacy race

The broader vision is to create high-end jobs, attract global investment, and enable enterprises to solve previously intractable problems — such as drug discovery and real-time logistics optimization — through quantum-powered solutions. The new tech park at Amaravati will host research labs, startup incubators, and training programs to build a

Macquarie Strategists Forecast USA Crude Inventory Build

In an oil and gas report sent to Rigzone late Monday by the Macquarie team, Macquarie strategists revealed that they are forecasting that U.S. crude inventories will be up by 2.4 million barrels for the week ending May 2. “This compares to our early look which anticipated a 5.2 million barrel build,” the strategists said in the report. “On the product side of the ledger, in aggregate, our expectations are also revised tighter for this week,” they added. In the report, the strategists noted that, “for this week’s crude balance, from refineries”, they “model crude runs modestly lower (-0.2 million barrels per day)”. “Among net imports, we model a large increase, with exports lower (-0.2 million barrels per day) and imports higher (+0.8 million barrels per day) on a nominal basis,” they added. The Macquarie strategists warned in the report that timing of cargoes remains a source of potential volatility in this week’s crude balance. They went on to note that, “from implied domestic supply (prod.+adj.+transfers)”, they “look for a reduction (-0.5 million barrels per day) following a strong nominal print last week”. “Rounding out the picture, we anticipate a smaller increase in SPR [Strategic Petroleum Reserve] stocks (+0.6 million barrels) this week,” the strategists said. The strategists also highlighted in the report that, “among products”, they “look for draws in gasoline (-2.7 million barrels) and distillate (-1.9 million barrels), with jet stocks nearly flat (+0.1 million barrels)”. “We model implied demand for these three products at ~14.5 million barrels per day for the week ending May 2,” the Macquarie strategists went on to state. In an oil and gas report sent to Rigzone by the Macquarie team on Thursday, Macquarie strategists outlined that they “anticipate a healthy U.S. crude build” in the U.S. Energy Information Administration’s (EIA) next weekly

Uniper Q1 Profit Slumps

Uniper SE on Tuesday reported EUR 82 million ($93.2 million) in net income for the first quarter (Q1), down about 83 percent from Q1 2024. Adjusted for non-operating impacts, the bottom line is a net loss of EUR 143 million, compared to EUR 581 million in net profit for Q1 2024. However, sales rose from EUR 17.98 billion for Q1 2024 to EUR 21.26 billion for Q1 2025. The Green Generation segment generated EUR 246 million in adjusted earnings before interest, taxes, depreciation and amortization (EBITDA) for the January-March 2025 period, down from EUR 278 million for Q1 2024. “The continued decline in price levels in Sweden led to lower earnings at Uniper’s nuclear and hydropower businesses there”, the Düsseldorf-based gas and power utility said in an online statement. “Exceptionally high water levels in reservoirs, resulting from high inflow due to a mild winter, had a significantly adverse impact on price levels particularly in the northern regions of Sweden. “However, the decline in earnings in Sweden was largely offset by Uniper’s hydropower portfolio in Germany, which delivered a positive earnings performance relative to the first quarter of the prior year thanks to a more favorable market environment”. Flexible Generation logged EUR 161 million in adjusted EBITDA, compared to EUR 656 million for Q1 2024. “The decline is particularly attributable to a reduction in earnings on hedging transactions on the fossil trading margin due to the general decline in price levels”, Uniper said. “In addition, the decommissioning of Ratcliffe power plant in the United Kingdom and Heyden 4 in Germany, the sale of the Gönyu power plant in Hungary, and the transfer of Staudinger 5 and Scholven B and C power plants in Germany to grid reserve had a negative impact on earnings compared with the prior-year quarter”. Greener Commodities registered negative

Iberdrola Raises Quarterly Revenue to $14.6 Billion

Iberdrola SA has reported EUR 12.86 billion ($14.62 billion) in revenue for the first quarter (Q1), up 1.5 percent from the same three-month period last year. However, net profit fell to EUR 2 billion, or EUR 0,302 per share – compared to EUR 2.76 billion for Q1 2025. Earnings before interest, taxes, depreciation and amortization (EBITDA) dropped from EUR 5.86 billion for Q1 2024 to EUR 4.64 billion for Q1 2025. “Excluding the capital gains from the divestment of thermal generation assets in the first quarter of 2024, net profit increased by 26 percent and EBITDA increased by 12 percent”, the Spanish utility said in its quarterly report. Iberdrola credited a record quarterly investment of EUR 2.72 billion for the pre-divestment increase in earnings. “The 12 percent increase in EBITDA was due to strong operational performance, with an increase in the company’s regulated profile, as 52 percent of EBITDA come from the Networks business, affected by the recognition of costs incurred in previous years under IFRS [international financial reporting standards] in the US”, Iberdrola said. “The contribution of the Electricity Production and Customers business decreases 8 percent, with a higher production in the United States, Rest of the World and Iberia, which partially offset the normalization of the margins in Iberia and in the United Kingdom”. Executive chair Ignacio Galán said in a company statement accompanying the report, “Our record investment in this quarter, and our planned future investments in networks, show how we’re focused on speeding up electrification in order to reduce external energy dependency, improve competitiveness, promote local industries and jobs and deliver price stability”. Iberdrola reported a net production of over 35,500 gigawatt hours (gWh), down 13.3 percent year-over-year as declines in gas combined-cycle generation and cogeneration offset a renewables increase to more than 25,200 gWh. Electricity supplies totaled

Miliband sets out proposals for solar canopies above car parks

Supermarkets, offices and shopping centres could be required to install solar panels over their car parks under plans being considered by the Government. The plan to create “solar carports” would generate energy to power homes, businesses and electric vehicles. The call for evidence from Ed Miliband’s Department for Energy Security and Net Zero will consider making solar panels mandatory for new car parks but also explore extending that to existing parking lots. It will also examine the cost of installing the panels above parking spaces. Officials believe that mandating the installation of solar panels in canopies over car parks would unlock “underutilised” space, create shaded parking spots and more charging points for electric vehicles. Energy Secretary Mr Miliband said: “Right now, the sun is shining on hundreds of thousands of car parking spaces across the country which could be used to power our homes and businesses. “We want to work with businesses and car park operators to turn our car parks into solar carports to save families and businesses money with clean, homegrown British energy.” The Government estimates that an 80-space car park could save around £28,000 per year in electricity bills by installing solar carports and using all electricity generated.

Welsh government completes £2m equity investment in tidal player Inyanga

The Welsh government announced on May 6 that it had completed a £2m equity investment in Inyanga Marine Energy Group, one of the companies involved in the tidal energy development at Morlais, Ynys Môn (Anglesey). The investment was also due to be announced by Welsh First Minister Eluned Morgan at the Marine Energy Wales Conference in Cardiff on May 6. It is expected to be used to fund improvements to tidal turbines, allowing them to produce up to 60% more energy, and then helping to test these improved turbines in real sea conditions at the Morlais tidal energy site. “The improved turbines will explore making tidal energy more practical, helping speed up the global move away from fossil fuels,” Morgan stated. The turbines will power most of the tidal energy projects planned for the Morlais site, which is located in an area characterised by strong currents. Morlais is being developed as a ‘plug and play’ model described as the first of its kind globally. Under plans for the site, Morlais, which is owned and managed by Ynys Môn social enterprise Menter Môn, will install infrastructure including a connection to the national grid and an onshore substation near Ynys Lawd (South Stack) and Parc Cybi. Morlais will then rent berths to various turbine development companies so they can use tidal energy to generate electricity, potentially allowing different types of electricity generation technology to be installed as part of the project. © Supplied by Welsh GovernmentMorlais tidal stream energy project will connect to the national grid and a substation on the shore near Ynys Lawd (South Stack). Inyanga is among the companies planning to participate in Morlais, with its HydroWing project, which has been awarded two contracts for difference (CfDs) in the UK government’s 2023 and 2024 allocation rounds, each for 10

Wind industry deals and vessel innovation power ahead

Acquisitions have dominated the shipping industry of late, namely Havfram of Norway’s takeover by Belgian group DEME; and that of Sweden-rooted Northern Offshore Group by Japanese shipping giant NYK. But before we get to the main course, we’re starting with a taster – completion of a pocket service vessel built for the UK-headquartered, but fundamentally Turkish-owned, Tor Group and developed by Macduff Ship Design, which, locally in Scotland, is famous as a designer of excellent fishing vessels. Tor Boreas is a cutting-edge, powerful, multi-purpose service vessel, but it is designed to be particularly able to serve offshore wind sector needs. The diesel-electric vessel is designed to comply with the not more than 24m registered length rules to allow it to operate under UK MCA Workboat Code Edition III regulations, but has also been built to Bureau Veritas regulations and approval for international operation. The deck gear pack includes a Melcal 13-tonne offshore crane, a Melcal hydraulic A-frame with a five-tonne lifting capacity, a 25-tonne electrically driven aft towing winch, a 500mm stern roller, and an aft deck with a clear deck area of more than 230 square feet. The open deck is capable of safely loading up to 60 tonnes of deck cargo or 3 x 20ft containers, and is arranged with modular installation for dive and survey support units. Remarkably for such a compact boat, Tor Boreas is also equipped with a 1500mm x 1500mm moon pool and is fitted with a VEEM stabiliser system. This should significantly reduce at sea motion, so enabling service operations to be conducted in poorer sea states than would normally be possible; and greater crew comfort. © Supplied by Macduff Ship DesignTor Boreas vessel. Takeovers First up, DEME acquired Havfram in a deal worth about €900 million (£770,787,000). Havfram is primarily owned by US

Liquid cooling technologies: reducing data center environmental impact

“Highly optimized cold-plate or one-phase immersion cooling technologies can perform on par with two-phase immersion, making all three liquid-cooling technologies desirable options,” the researchers wrote. Factors to consider There are numerous factors to consider when adopting liquid cooling technologies, according to Microsoft’s researchers. First, they advise performing a full environmental, health, and safety analysis, and end-to-end life cycle impact analysis. “Analyzing the full data center ecosystem to include systems interactions across software, chip, server, rack, tank, and cooling fluids allows decision makers to understand where savings in environmental impacts can be made,” they wrote. It is also important to engage with fluid vendors and regulators early, to understand chemical composition, disposal methods, and compliance risks. And associated socioeconomic, community, and business impacts are equally critical to assess. More specific environmental considerations include ozone depletion and global warming potential; the researchers emphasized that operators should only use fluids with low to zero ozone depletion potential (ODP) values, and not hydrofluorocarbons or carbon dioxide. It is also critical to analyze a fluid’s viscosity (thickness or stickiness), flammability, and overall volatility. And operators should only use fluids with minimal bioaccumulation (the buildup of chemicals in lifeforms, typically in fish) and terrestrial and aquatic toxicity. Finally, once up and running, data center operators should monitor server lifespan and failure rates, tracking performance uptime and adjusting IT refresh rates accordingly.

Cisco unveils prototype quantum networking chip

Clock synchronization allows for coordinated time-dependent communications between end points that might be cloud databases or in large global databases that could be sitting across the country or across the world, he said. “We saw recently when we were visiting Lawrence Berkeley Labs where they have all of these data sources such as radio telescopes, optical telescopes, satellites, the James Webb platform. All of these end points are taking snapshots of a piece of space, and they need to synchronize those snapshots to the picosecond level, because you want to detect things like meteorites, something that is moving faster than the rotational speed of planet Earth. So the only way you can detect that quickly is if you synchronize these snapshots at the picosecond level,” Pandey said. For security use cases, the chip can ensure that if an eavesdropper tries to intercept the quantum signals carrying the key, they will likely disturb the state of the qubits, and this disturbance can be detected by the legitimate communicating parties and the link will be dropped, protecting the sender’s data. This feature is typically implemented in a Quantum Key Distribution system. Location information can serve as a critical credential for systems to authenticate control access, Pandey said. The prototype quantum entanglement chip is just part of the research Cisco is doing to accelerate practical quantum computing and the development of future quantum data centers. The quantum data center that Cisco envisions would have the capability to execute numerous quantum circuits, feature dynamic network interconnection, and utilize various entanglement generation protocols. The idea is to build a network connecting a large number of smaller processors in a controlled environment, the data center warehouse, and provide them as a service to a larger user base, according to Cisco. The challenges for quantum data center network fabric

Zyxel launches 100GbE switch for enterprise networks

Port specifications include: 48 SFP28 ports supporting dual-rate 10GbE/25GbE connectivity 8 QSFP28 ports supporting 100GbE connections Console port for direct management access Layer 3 routing capabilities include static routing with support for access control lists (ACLs) and VLAN segmentation. The switch implements IEEE 802.1Q VLAN tagging, port isolation, and port mirroring for traffic analysis. For link aggregation, the switch supports IEEE 802.3ad for increased throughput and redundancy between switches or servers. Target applications and use cases The CX4800-56F targets multiple deployment scenarios where high-capacity backbone connectivity and flexible port configurations are required. “This will be for service providers initially or large deployments where they need a high capacity backbone to deliver a primarily 10G access layer to the end point,” explains Nguyen. “Now with Wi-Fi 7, more 10G/25G capable POE switches are being powered up and need interconnectivity without the bottleneck. We see this for data centers, campus, MDU (Multi-Dwelling Unit) buildings or community deployments.” Management is handled through Zyxel’s NebulaFlex Pro technology, which supports both standalone configuration and cloud management via the Nebula Control Center (NCC). The switch includes a one-year professional pack license providing IGMP technology and network analytics features. The SFP28 ports maintain backward compatibility between 10G and 25G standards, enabling phased migration paths for organizations transitioning between these speeds.

Engineers rush to master new skills for AI-driven data centers

According to the Uptime Institute survey, 57% of data centers are increasing salary spending. Data center job roles that saw the highest increases were in operations management – 49% of data center operators said they saw highest increases in this category – followed by junior and mid-level operations staff at 45%, and senior management and strategy at 35%. Other job categories that saw salary growth were electrical, at 32% and mechanical, at 23%. Organizations are also paying premiums on top of salaries for particular skills and certifications. Foote Partners tracks pay premiums for more than 1,300 certified and non-certified skills for IT jobs in general. The company doesn’t segment the data based on whether the jobs themselves are data center jobs, but it does track 60 skills and certifications related to data center management, including skills such as storage area networking, LAN, and AIOps, and 24 data center-related certificates from Cisco, Juniper, VMware and other organizations. “Five of the eight data center-related skills recording market value gains in cash pay premiums in the last twelve months are all AI-related skills,” says David Foote, chief analyst at Foote Partners. “In fact, they are all among the highest-paying skills for all 723 non-certified skills we report.” These skills bring in 16% to 22% of base salary, he says. AIOps, for example, saw an 11% increase in market value over the past year, now bringing in a premium of 20% over base salary, according to Foote data. MLOps now brings in a 22% premium. “Again, these AI skills have many uses of which the data center is only one,” Foote adds. The percentage increase in the specific subset of these skills in data centers jobs may vary. The Uptime Institute survey suggests that the higher pay is motivating workers to stay in the

ExtraHop looks to eliminate ‘extra hops’ in NDR stack

This deep visibility allows ExtraHop to provide insights across the entire network stack, from basic connectivity to application-level transactions. “The benefit of going all the way through Layer 7 is I can actually see a database transaction going through on the wire,” Vasani said. “If you have application teams complaining about database query latency, we can map it to what session was that tied to and what flows was it tied to from a network perspective and is this really an app server issue, or is it a network issue, or is it an endpoint issue?” The new sensor integrates with ExtraHop’s RevealX platform, feeding telemetry into the company’s cloud-scale ML/AI engine that powers its detection and analysis capabilities. “The sensor collects the telemetry, feeds it into an ML/AI engine that sits in the cloud, and then we layer in workflow engines on top to enable the various use cases,” Vasani said. In modern distributed enterprise environments, network visibility must extend beyond traditional data centers. ExtraHop’s all-in-one sensor is designed to address this reality with deployment options that span physical appliances, virtual machines and cloud environments. ExtraHop has both virtual and physical hardware appliances for sensor deployment. ExtraHop sensors can plug into a network through multiple methods including, Network Tap, SPAN (Switched Port Analyzer) port, packet broker or a cloud provider’s vTAP capabilities.

AI’s energy appetite drives interest in nuclear power

In its new report, Deloitte said that its analysis of figures from the World Nuclear Association, the American Nuclear Society, the U.S. Department of Energy, and others showed that new nuclear power could potentially meet about 10% of the projected increase in data center demand over the next decade, assuming capacity is also significantly expanded by between 35GW and 62GW, and 30% of the expansion is earmarked for data centers. “Nuclear energy presents a potential solution for meeting some of the growing electricity demands of data centers, with its reliable and clean energy profile,” Deloitte’s report said, noting five key advantages of the technology: Reliable baseload power: Nuclear reactors operate 24/7, regardless of the weather, providing the reliable power so important to data centers. In addition, Deloitte said, “Their capacity factor, exceeding 92.5%, outperforms other sources like natural gas (56%) and renewables like wind (35%) and solar (25%).” High energy density: A small amount of fuel generates a lot of power, which minimizes the need for fuel storage and transportation. “This efficiency can translate to a smaller physical footprint and enhanced sustainability,” Deloitte said. Scalable power output: A full-sized reactor typically generates 800 megawatts (MW) or more of electricity, which accommodates the needs of large data centers. Low carbon emissions: Nuclear power plants produce virtually no greenhouse gas emissions during operation. Enhanced land use efficiency: Compared to other energy sources, nuclear power plants require relatively little land. Gartner’s Johnson echoed these advantages, and also predicted that nuclear energy, and small modular reactors (SMRs) in particular, will “provide a viable answer” to the question of what to do when electricity demand exceeds supply. They can, he said, “ensure independence from grid power fluctuations by providing dedicated on-site power for large data centers.” However, both Gartner and Deloitte also highlighted challenges in

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE