What you need to know about Amazon Nova Act: the new AI agent SDK challenging OpenAI, Microsoft, Salesforce

Stay Ahead, Stay ONMINE

What you need to know about Amazon Nova Act: the new AI agent SDK challenging OpenAI, Microsoft, Salesforce

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The sleeping giant has awoken! For a while, it seemed like Amazon was playing catchup in the race to offer its users — particularly the millions of developers building atop Amazon Web Services (AWS)’s cloud infrastructure […]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

The sleeping giant has awoken!

For a while, it seemed like Amazon was playing catchup in the race to offer its users — particularly the millions of developers building atop Amazon Web Services (AWS)’s cloud infrastructure — compelling first-party AI models and tools.

But in late 2024, it debuted its own internal foundation model family, Amazon Nova, with text, image and even video generation capabilities, and last month saw a new Amazon Alexa voice assistant powered in part by Anthropic’s Claude family of models.

Then, on Monday, the e-commerce and cloud giant’s artificial general intelligence division Amazon AGI has announced the release of Amazon Nova Act, an experimental developer kit for building AI agents that can navigate the web and complete tasks autonomously, powered by a custom, proprietary version of Amazon’s Nova large language model (LLM). Oh, and the standard developer kit (SDK) is open source under a permissive Apache 2.0 license, though the SDK is designed to work only with Amazon’s in-house custom Nova model, not any third-party ones.

The goal is to enable third-party developers to build AI agents capable of reliably performing tasks within web browsers.

But how does Amazon’s Nova Act stack up to other agent building platforms out there on the market, such as Microsoft’s AutoGen, Salesforce’s Agentforce, and of course, OpenAI’s recently released open source Agents SDK?

A different, more thoughtful approach to AI agents

Since the public rise of large language models (LLMs), most “agent” systems have been limited to responding in natural language or providing information by querying knowledge bases.

Nova Act is part of the larger industry shift toward action-based agents—systems that can complete actual tasks across digital environments on behalf of the user. OpenAI’s new Responses API, which gives users access to its autonomous browser navigator, is one leading example of this, which developers can integrate into AI agents through the OpenAI Agents SDK.

Amazon AGI emphasizes that current agent systems, while promising, struggle with reliability and often require human supervision, especially when handling multi-step or complex workflows.

Nova Act is specifically designed to address these limitations by providing a set of atomic, prescriptive commands that can be chained together into reliable workflows.

Deniz Birlikci, a Member of Technical Staff at Amazon, described the broader vision in a video introducing Nova Act: soon, there will be more AI agents than people browsing the web, carrying out tasks on behalf of users.

David Luan, VP of Amazon’s Autonomy Team and Head of AGI SF Lab, framed the mission more directly in a recent video call interview with VentureBeat: “We’ve created this new experimental AI model that is trained to perform actions in a web browser. Fundamentally, we think that agents are the building block of computing,” he said.

Luan, formerly a co-founder and CEO of Adept AI, joined Amazon in 2024 as part of an aqcui-hire. Luan said he has long been a proponent of AI agents. “With Adept, we were the first company to really start working on AI agents. At this point, everybody knows how important agents are. It was pretty cool to be a bit ahead of our time,” he added.

What Nova Act offers devs

The Nova Act SDK provides developers with a framework for constructing web-based automation agents using natural language prompts broken down into clear, manageable steps.

Unlike typical LLM-powered agents that attempt entire workflows from a single prompt—often resulting in unreliable behavior—Nova Act is designed to incrementally execute smaller, verifiable tasks.

Some of the key features of Nova Act include:

Fine-Grained Task Decomposition: Developers can break down complex digital workflows into smaller act() calls, each guiding the agent to perform specific UI interactions.
Direct Browser Manipulation via Playwright: Nova Act integrates with Playwright, an open-source browser automation framework developed by Microsoft. Playwright allows developers to control web browsers programmatically—clicking elements, filling forms, or navigating pages—without relying solely on AI predictions. This integration is particularly useful for handling sensitive tasks such as entering passwords or credit card details. For example, instead of sending sensitive information to the model, developers can instruct Nova Act to focus on a password field and then use Playwright APIs to securely enter the password without the model ever “seeing” it. This approach helps strengthen security and privacy when automating web interactions.
Python Integration: The SDK allows developers to interleave Python code with Nova Act commands, including standard Python tools such as breakpoints, assertions, or thread pooling for parallel execution.
Structured Information Extraction: The SDK supports structured data extraction through Pydantic schemas, allowing agents to convert screen content into structured formats.
Parallelization and Scheduling: Developers can run multiple Nova Act instances concurrently and schedule automated workflows without the need for continuous human oversight.

Luan emphasized that Nova Act is a tool for developers rather than a general-purpose chatbot. “Nova Act is built for developers. It’s not a chatbot you talk to for fun. It’s designed to let developers start building useful products,” he said.

For example, one of the sample workflows demonstrated in Amazon’s documentation shows how Nova Act can automate apartment searches by scraping rental listings and calculating biking distance to train stations, then sorting the results in a structured table.

Another showcased example uses Nova Act to order a specific salad from Sweetgreen every Tuesday, entirely hands-free and on a schedule, illustrating how developers can automate repeatable digital tasks in a way that feels reliable and customizable.

Benchmark performance and a focus on reliability

A central message in Amazon’s announcement is that reliability, not just intelligence, is the key barrier to widespread agent adoption.

Current state-of-the-art models are actually quite brittle at powering AI agents, with agents typically achieving 30% to 60% success rates on browser-based multi-step tasks, according to Amazon.

Nova Act, however, emphasizes a building-block approach, scoring over 90% on internal evaluations of tasks that challenge other models—such as interacting with dropdowns, date pickers, or pop-ups.

Luan underscored why that reliability focus matters. “What we’ve really focused on is how do you actually make agents reliable? If you ask it to update a record in Salesforce and it deletes your database one out of ten times, you’re probably never going to use it again,” he said.

Amazon AGI benchmarked Nova Act against competing models including Anthropic’s Claude 3.7 Sonnet and OpenAI’s CUA model. On the ScreenSpot Web Text benchmark, which tests instruction-following on textual screen elements, Nova Act achieved a score of 0.939, outperforming Claude 3.7 Sonnet (0.900) and OpenAI CUA (0.883).

Amazon Nova Act benchmarks. Credit: Amazon

On the ScreenSpot Web Icon benchmark, which focuses on visual UI elements, Nova Act scored 0.879, again ahead of the other models.

However, on the GroundUI Web benchmark, which tests general UI interaction, Nova Act scored 0.805, slightly behind its competitors.

These scores were measured internally by Amazon using consistent prompts and evaluation criteria.

Amazon also highlighted early results in Nova Act’s ability to generalize beyond standard environments.

For instance, team member Rick Liu demonstrated how the agent, without explicit training, successfully interacted with a pigeon-themed web game—assigning stats, battling opponents, and progressing in the game.

According to Luan, that ability to generalize is central to the long-term vision. “Our goal with Nova Act is to be a universal browser-use solution. We want an agent that can do anything you want to do on a computer for you,” he said.

Flexible for use in different clouds, but locked to Amazon’s Nova model

While Nova Act is accessible to developers globally through nova.amazon.com, Luan clarified that the system is tightly coupled to Amazon’s in-house Nova foundation models.

Developers cannot plug in external LLMs such as OpenAI’s GPT-4o or Anthropic’s Claude 3.7 Sonnet, unlike with OpenAI’s Agents SDK, and to a lesser extent, Microsoft’s AutoGen and Salesforce’s Agentforce platforms (which allow switching to a few different provider companies and model families).

“Nova Act is a custom trained version of the Nova model,” he said. “It’s not just a scaffolding over a generic LLM. It’s natively trained to act on the internet on your behalf.”

However, Nova Act is not restricted to AWS environments. Developers can download the SDK and run it locally, in the cloud, or wherever they choose. “You don’t need to be on AWS to use it,” Luan stated.

Thus, for businesses looking for maximum underlying model flexibility for their agents, Nova Act is probably not the best choice. However, for those looking for a purpose-built model specifically designed to navigate the web and perform actions across a wide variety of websites with very different user interfaces (UIs), it’s probably worth a look — especially if you’re already in the Amazon or AWS developer ecosystem.

Security, licensing and pricing

The Nova Act SDK is released under the Apache License, Version 2.0 (January 2004), an open source license. However, this applies only to the SDK software.

The Nova Act model itself, along with its weights and training data, is proprietary and remains closed-source. The approach is intentional, according to Luan, who explained that the model is tightly integrated and co-trained with the SDK to achieve reliability.

At launch, Nova Act is offered as a free research preview. There is no announced pricing for production use yet.

Luan described this phase as an opportunity for developers to experiment and build with the technology. “Our belief is that the majority of the most useful agent products have not yet been built. We want to enable anybody to build a really useful agent, whether for themselves or as a product,” he said.

Longer term, Amazon plans to introduce production-grade terms, including usage-based billing and scaling guarantees, but those are not yet available.

What’s next for Nova Act?

The release of Nova Act reflects Amazon’s broader ambition to make action-oriented AI agents a foundational component of computing.

Luan summed up the opportunity ahead: “My personal dream is that agents become the building block of computing, and the coolest new startups and products get built on top of what our team is developing.”

The Nova Act SDK is available now for experimentation and prototyping on Amazon’s website and on Github.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

New Relic simplifies Kubernetes performance monitoring

For customers, New Relic explains that the benefits of eAPM include: Faster troubleshooting: Debug more quickly because they can monitor metrics, transaction details, and database performance in one place. Speedy deployment without altering existing code: Enable quick setup of application performance monitoring, discover all applications and services, identify critical span

MAD Chairs: A new tool to evaluate AI

This is a linkpost for https://arxiv.org/abs/2503.20986v2

IBM Cloud speeds AI workloads with Intel Gaudi 3 accelerators

For businesses that need more control over their AI development, IBM says they can deploy IBM watsonx.ai software with the Intel Gaudi 3-based virtual server on IBM Cloud VPC in Q2 2025. IBM watsonx.ai includes an end-to-end AI development studio, AI developer toolkit and full AI lifecycle management for developing AI services

Nvidia’s Blackwell raises the bar with new MLPerf Inference V5.0 results

Fifteen partners, including Cisco, Fujitsu, Hewlett Packard Enterprise, Dell Technologies, Oracle, and Google Cloud, were involved in the latest round of MLPerf testing, which, he noted, was the largest number of Nvidia partners submitting to the benchmark in any given round. When asked about his overall impression on the latest

Trump guts LIHEAP, threatening $378M in energy assistance already approved by Congress

Mass firings at the Department of Health and Human Services on Tuesday included all staff of the Low Income Home Energy Assistance Program, a federally-funded initiative which helps vulnerable families afford their electricity and gas bills. Consumer advocates say the layoffs threaten hundreds of millions in assistance already approved by Congress but not yet sent to states. LIHEAP was created in 1981 and typically enjoys bipartisan support. Electric utilities support the program, calling it a “vital source of aid” for about 6 million low-income households annually. Federal funds are authorized by Congress and directed to state programs that administer LIHEAP benefits. While these state programs have funding to continue to operate for the coming months, “the elimination of federal staff threatens the stability of this popular, essential program in the coming fiscal years,” the National Consumer Law Center and National Energy Assistance Directors Association said in a joint statement. LIHEAP received about $4.1 billion in fiscal year 2024, and Congress continued funding at that level for 2025. Most of the funds have already been released to the states but about 10% remains, and those cannot be released until HHS determines the state-by-state allocation, NEADA Executive Director Mark Wolfe said in an email. “The person responsible for making the calculations was also laid off Monday morning,” Wolfe said. “My concern is that the administration could say that without an allocation or staff to oversee the funding, they cannot distribute the $378 million to the states.” “It’s critical that HHS ensure there is no disruption to the administration of the LIHEAP program in order to protect families during future hot summers and cold winters,” NCLC Senior Attorney Olivia Wein said. HHS Secretary Robert Kennedy Jr. called the HHS layoffs “a difficult moment,” but said in a post on X that “this overhaul is

Experts raise concerns about cybersecurity and energy storage systems

Dive Brief: Energy storage systems, as well as other newer forms of distributed energy resources, could be particularly vulnerable to cyberattacks and other security risks because of their reliance on cloud-based computer software, experts said Tuesday during a panel hosted by the Clean Energy States Alliance. While the panelists said they were not aware of any direct attacks on energy storage systems to date and acknowledged the importance of energy storage to the energy transition, they also said such systems would require greater cybersecurity safeguards than more traditional energy technologies. Speakers urged regulators and utilities to run a cybersecurity risk assessment and put protocols into place for addressing potential cybersecurity breaches within their energy storage or distributed resource networks. Dive Insight: Energy storage, coupled with other distributed energy resources and cloud computing, represents a major potential boon to the energy transition and utilities, Howard Gugel, senior vice president of regulatory oversight at the North American Electric Reliability Corporation, said during the CESA panel. But while connectivity to the cloud can enable remote repairs at a mass scale and other impressive feats, he said this same capability also gives him some reason for pause. “It raised a wow factor from two perspectives,” he said of his recent observation of a mass update sent out to an inverter-based resource. “One, wow isn’t it great that we are able to respond quickly and fix a problem … But then the other wow is, if this were to fall into the wrong hands … this could have been a bad situation.” Gugel noted that while energy storage isn’t unique in this regard, the fact that most energy storage systems are relatively new means they are more likely to integrate some form of Wi-Fi or Bluetooth connection, and to rely on software or data based

DOJ attorney in EPA funding freeze case breaks with Zeldin’s fraud comments

Dive Brief: Marc Sacks, a Department of Justice attorney arguing on behalf of the U.S. Environmental Protection Agency, sought distance from EPA Administrator Lee Zeldin’s accusations of fraud during a Wednesday hearing in a lawsuit filed by Greenhouse Gas Reduction Fund grant recipients whose funding has been frozen. Sacks argued for EPA’s right to freeze the funding based on a clause in the August 2024 grant agreement which allowed the agency to terminate a contract based on grantee compliance with EPA’s general terms and conditions, saying a dispute exists over whether or not that gave EPA “the right at that point in time to terminate for agency priorities.” District Judge Tanya Chutkan noted that this was a shift away from EPA’s public argument, as its termination letter to the grantees had alleged “waste, fraud, and abuse … That you were launching investigations, and you know there was malfeasance. You seem to be abandoning that position now.” Dive Insight: Zeldin has commented several times on the freeze since it began Feb. 16, frequently alleging grantee fraud and stating in a March 5 release that “$20 billion was given to just eight pass-through nongovernmental entities in an effort riddled with self-dealing, conflicts of interest, and an extreme lack of qualifications.” In a March 2 letter to Acting Inspector General Nicole Murley, Acting Deputy EPA Administrator Chad McIntosh said that fund holder Citibank had voluntarily paused further disbursements of GGRF funding, then cited “reckless financial management, blatant conflicts of interest, astonishing sums of tax dollars awarded to unqualified recipients, and severe deficiencies in regulatory oversight under the prior administration.” McIntosh also cited “an unusual and apparently improper structure of the agreements governing the administration of the GGRF altogether excluded EPA from being a party to Account Control Agreements (ACAs) with subrecipients.” Sacks argued

UK energy companies will ‘turn to other markets’ due to Trump tariffs, says EIC

UK energy companies are expected to avoid doing business in the United States as a result of Trump slapping a 10% tariff on British exports. The President of the United States, Donald Trump, has unveiled a series of new tariffs that are due to come into effect on 9 April, in addition to plans to increase existing tariffs on steel and aluminium. “Overreaching tariff regimes have always been detrimental to business, and given its global reach and interconnectedness, the energy supply chain will not be spared from that,” said Rebecca Groundwater, the Energy Industries Council’s (EIC) head of external affairs. She said that if companies are currently operational in the US, they may be able to absorb the latest regulatory and policy changes from the White House. “But for those considering the US as a potential market, these increased costs of doing business and operations will give them pause,” Groundwater said. “We anticipate these organisations will instead turn to other markets where the ease of doing business, regulatory framework, and policy environment are more stable.” Unstable regions “do not entice business”, she warned, as the US moved to impose a 10% levy on all imported manufactured goods from the UK. The organisation declined to comment on the impact of Brexit on the UK’s position, as the US tariff on Europe will be nearly double at 20%. Industry trade body Make UK called on the UK government to set up a tariffs taskforce to create a dialogue with industry. Make UK’s chief executive officer Stephen Phipson called the export tariff changes “devastating”. He said: “The US president’s announcement of 10% tariffs on UK goods exported to the United States and 25% tariffs on British made autos, steel and aluminium is devastating for UK manufacturing. “Not only will volumes of direct exports to the US

North Sea production hub rankings: A chance for BP to climb the table

Of all the production hubs in the North Sea, the top four producers have remained unchanged since 2023; however, with the right policy, BP might climb the rankings. Dundas Consultants has ranked the top 15 highest producing hubs in UK waters, and despite TotalEnergies’ recent statement that it sees “little future” in the country, it still controls the top two. The French supermajor’s Elgin and Culzean hubs hold the first and second place spots on the league table, respectively. The next three, Clair, Glen Lyon, and ETAP, are all operated by the London-listed BP, and according to Dundas director Richard Woodhouse, its highest-ranked of the trio has the potential to climb higher. “The large oil in place in the West of Shetland Clair area arising from the very large size of the oil bearing structures mean that there could in time, still be large development projects taking place to increase the production over the hub, in particular at the Clair South area,” Woodhouse explained. The Clair South development is now referred to by BP simply as Clair phase three. © Supplied by Dundas ConsultantsThe top 15 production hubs in UK waters. Source: Dundas Consultants. This comes as the top four hubs have all seen a double-digit downturn in production. “If we look at the decline in production from 2023 to 2024, we have 21% for Elgin, 19% for Culzean and 13% each for Clair and Glen Lyon hubs,” Woodhouse added. “If those decline rates continued into the future, the top 4 league table positions would not actually change.” So, in order to see a shake-up in the table, there needs to be “game changers for the hubs”, and this is on the cards for BP. © Supplied by BPBP started up Clair Ridge – the oilfield’s second phase – in

The Tokamak Interview: Warrick Matthews, chief executive

Speaking at Hampton Court Palace on the sidelines of the Terra Carta Sustainable Markets Initiative sustainable transition summit, Tokamak Energy chief executive Warrick Matthews described how his job involves “myth busting” around nuclear fusion. In an exclusive interview, he explains how the company is on a path to commercialising nuclear fusion technology after raising $150 million from existing and strategic investors in November. Energy Voice: Can you tell me any more about Tokamak’s tech scale-up prototype for wind power? Warrick Matthews: We’re 15 years old as a company and our founders spun out of UKAEA (the UK Atomic Energy Authority). They had operated a conventional tokamak. A tokamak is a device to hold plasma in a strong magnetic field. The conventional tokamak looks like a doughnut, a Torus. Our device looks a bit more like a cored apple shape. Future machines have to have superconductors. You put energy into them and at an operating temperature they have zero resistance. They can carry very high currents, which is why they’re suddenly interesting for power transmission. Or you can wind them into coil packs and produce very high-field magnets – that’s the magnet that goes in a big sort of D-shape, or in a ring around the plasma. You mentioned how tokamaks can be used for power generation. Please elaborate.We’re engaged across numerous verticals [with] strategic partners now in propulsion – in water, on land for rail, in the air for new hybrid propulsion and then in space. When you look at the biggest, highest power-output offshore wind turbines, they’re enormous and expensive and a lot of the design is based around the nacelle at the top – which is extremely heavy because it’s got hundreds of tonnes of rare-earth permanent magnet installed in the machine. If you replace that with the

New MLCommons benchmarks to test AI infrastructure performance

The latest release also broadens its scope beyond chatbot benchmarks. A new graph neural network (GNN) test targets datacenter-class hardware and is designed for workloads like fraud detection, recommendation engines, and knowledge graphs. It uses the RGAT model based on a graph dataset containing over 547 million nodes and 5.8 billion edges. Judging performance Analysts suggest that these benchmarks will make it easier to judge the performance of various hardware chips and clusters based on documented models. “As every chipmaker seeks to prove that its hardware is good enough to support AI, we now have a standard benchmark that shows the quality of question support, math, and coding skills associated with hardware,” said Hyoun Park, CEO and Chief Analyst at Amalgam Insights. Chipmakers can now compete not just on traditional speeds and feeds, but in mathematical skill and informational accuracy. This benchmark provides a rare opportunity to add new performance standards on cross-vendor hardware, Park added. “The latency in terms of how quickly tokens are delivered and the time for the user to see the response is the deciding factor,” said Neil Shah, partner and co-founder at Counterpoint Research. “This is where players such as NVIDIA, AMD, and Intel have to get the software right to help developers optimize the models and bring out the best compute performance.” Benchmarking and buying decisions Independent benchmarks like those from MLCommons play a key role in helping buyers evaluate system performance, but relying on them alone may not provide the full picture.

Potential Nvidia chip shortage looms as Chinese customers rush to beat US sales ban

Will it lead to shortages? The US first placed export controls on chips sent to China in October 2022 as a means to slow the country’s technological advances. It blocked the sale of Nvidia’s A100 and H100 chips, leading the company to develop the less powerful A800 and H800 chips for the market; they were also subsequently banned. There was a surge in demand for the H20 following the arrival of Chinese startup DeepSeek’s ultra low-cost, open-source AI model in January. And while the H20 is reported to be 15 times slower than Nvidia’s newest Blackwell chips sold elsewhere in the world, it was designed specifically by Nvidia to comply with the further US export controls introduced in October 2023. It is being used by Chinese companies for training, although it’s billed as an inference chip, explained Matt Kimball, VP and principal analyst for datacenter compute and storage at Moor Insights & Strategy. Should Nvidia choose to focus its efforts on manufacturing more of the chips, Kimball said he doesn’t think it will impact supply in the US and Europe, as Blackwell is the main product sold in those markets and H20 is an N-1 Hopper architecture chip. “If you take this a step further and ask whether this large order slows down the production of chips destined for the US and Europe, I’d say the answer is no, as the Hopper family is built on a different process node than the Blackwell family,” he said. Still, Kimball noted, “supply chain management is difficult, especially for smaller organizations that are put to the back of the line as hyperscalers with multibillion dollar orders are first in line for the newest [chips].”

European cloud group invests to create what it dubs “Trump-proof cloud services”

But analysts have questioned whether the Microsoft move truly addresses those European business concerns. Phil Brunkard, executive counselor at Info-Tech Research Group UK, said, commenting on last month’s announcement of the EU Data Boundary for the Microsoft Cloud, “Microsoft says that customer data will remain stored and processed in the EU and EFTA, but doesn’t guarantee true data sovereignty.” And European companies are now rethinking what data sovereignty means to them. They are moving beyond having it refer to where the data sits to focusing on which vendors control it, and who controls them. Responding to the new Euro cloud plan, another analyst, IDC VP Dave McCarthy, saw the effort as “signaling a growing European push for data control and independence.” “US providers could face tougher competition from EU companies that leverage this tech to offer sovereignty-friendly alternatives. Although €1 million isn’t a game-changer on its own, it’s a clear sign Europe wants to build its own cloud ecosystem—potentially at the expense of US market share,” McCarthy said. “For US providers, this could mean investing in more EU-based data centers or reconfiguring systems to ensure European customers’ data stays within the region. This isn’t just a compliance checkbox. It’s a shift that could hike operational costs and complexity, especially for companies used to running centralized setups.” Adding to the potential bad news for US hyperscalers, McCarthy said that there was little reason to believe that this trend would be limited to Europe. “If Europe pulls this off, other regions might take note and push for similar sovereignty rules. US providers could find themselves adapting to a patchwork of regulations worldwide, forcing a rethink of their global strategies,” McCarthy said. “This isn’t just a European headache, it’s a preview of what could become a broader challenge.”

Talent gap complicates cost-conscious cloud planning

The top strategy so far is what one enterprise calls the “Cloud Team.” You assemble all your people with cloud skills, and your own best software architect, and have the team examine current and proposed cloud applications, looking for a high-level approach that meets business goals. In this process, the team tries to avoid implementation specifics, focusing instead on the notion that a hybrid application has an agile cloud side and a governance-and-sovereignty data center side, and what has to be done is push functionality into the right place. The Cloud Team supporters say that an experienced application architect can deal with the cloud in abstract, without detailed knowledge of cloud tools and costs. For example, the architect can assess the value of using an event-driven versus transactional model without fixating on how either could be done. The idea is to first come up with approaches. Then, developers could work with cloud providers to map each approach to an implementation, and assess the costs, benefits, and risks. Ok, I lied about this being the top strategy—sort of, at least. It’s the only strategy that’s making much sense. The enterprises all start their cloud-reassessment journey on a different tack, but they agree it doesn’t work. The knee-jerk approach to cloud costs is to attack the implementation, not the design. What cloud features did you pick? Could you find ones that cost less? Could you perhaps shed all the special features and just host containers or VMs with no web services at all? Enterprises who try this, meaning almost all of them, report that they save less than 15% on cloud costs, a rate of savings that means roughly a five-year payback on the costs of making the application changes…if they can make them at all. Enterprises used to build all of

Lightmatter launches photonic chips to eliminate GPU idle time in AI data centers

“Silicon photonics can transform HPC, data centers, and networking by providing greater scalability, better energy efficiency, and seamless integration with existing semiconductor manufacturing and packaging technologies,” Jagadeesan added. “Lightmatter’s recent announcement of the Passage L200 co-packaged optics and M1000 reference platform demonstrates an important step toward addressing the interconnect bandwidth and latency between accelerators in AI data centers.” The market timing appears strategic, as enterprises worldwide face increasing computational demands from AI workloads while simultaneously confronting the physical limitations of traditional semiconductor scaling. Silicon photonics offers a potential path forward as conventional approaches reach their limits. Practical applications For enterprise IT leaders, Lightmatter’s technology could impact several key areas of infrastructure planning. AI development teams could see significantly reduced training times for complex models, enabling faster iteration and deployment of AI solutions. Real-time AI applications could benefit from lower latency between processing units, improving responsiveness for time-sensitive operations. Data centers could potentially achieve higher computational density with fewer networking bottlenecks, allowing more efficient use of physical space and resources. Infrastructure costs might be optimized by more efficient utilization of expensive GPU resources, as processors spend less time waiting for data and more time computing. These benefits would be particularly valuable for financial services, healthcare, research institutions, and technology companies working with large-scale AI deployments. Organizations that rely on real-time analysis of large datasets or require rapid training and deployment of complex AI models stand to gain the most from the technology. “Silicon photonics will be a key technology for interconnects across accelerators, racks, and data center fabrics,” Jagadeesan pointed out. “Chiplets and advanced packaging will coexist and dominate intra-package communication. The key aspect is integration, that is companies who have the potential to combine photonics, chiplets, and packaging in a more efficient way will gain competitive advantage.”

Silicon Motion rolls SSD kit to bolster AI workload performance

The kit utilizes the PCIe Dual Ported enterprise-grade SM8366 controller with support for PCIe Gen 5 x4 NVMe 2.0 and OCP 2.5 data center specifications. The 128TB SSD RDK also supports NVMe 2.0 Flexible Data Placement (FDP), a feature that allows advanced data management and improved SSD write efficiency and endurance. “Silicon Motion’s MonTitan SSD RDK offers a comprehensive solution for our customers, enabling them to rapidly develop and deploy enterprise-class SSDs tailored for AI data center and edge server applications.” said Alex Chou, senior vice president of the enterprise storage & display interface solution business at Silicon Motion. Silicon Motion doesn’t make drives, rather it makes reference design kits in different form factors that its customers use to build their own product. Its kits come in E1.S, E3.S, and U.2 form factors. The E1.S and U.2 forms mirror the M.2, which looks like a stick of gum and installs on the motherboard. There are PCI Express enclosures that hold four to six of those drives and plug into one card slot and appear to the system as a single drive.

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE