New approach to agent reliability, AgentSpec, forces agents to follow rules

Stay Ahead, Stay ONMINE

New approach to agent reliability, AgentSpec, forces agents to follow rules

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More AI agents have a safety and reliability problem. Agents would allow enterprises to automate more steps in their workflows, but they can take unintended actions while executing a task, are not very flexible, and are difficult […]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

AI agents have a safety and reliability problem. Agents would allow enterprises to automate more steps in their workflows, but they can take unintended actions while executing a task, are not very flexible, and are difficult to control.

Organizations have already sounded the alarm about unreliable agents, worried that once deployed, agents might forget to follow instructions.

OpenAI even admitted that ensuring agent reliability would involve working with outside developers, so it opened up its Agents SDK to help solve this issue.

But researchers from the Singapore Management University (SMU) have developed a new approach to solving agent reliability.

AgentSpec is a domain-specific framework that lets users “define structured rules that incorporate triggers, predicates and enforcement mechanisms.” The researchers said AgentSpec will make agents work only within the parameters that users want.

Guiding LLM-based agents with a new approach

AgentSpec is not a new LLM but rather an approach to guide LLM-based AI agents. The researchers believe AgentSpec can be used not only for agents in enterprise settings but useful for self-driving applications.

The first AgentSpec tests integrated on LangChain frameworks, but the researchers said they designed it to be framework-agnostic, meaning it can also run on ecosystems on AutoGen and Apollo.

Experiments using AgentSpec showed it prevented “over 90% of unsafe code executions, ensures full compliance in autonomous driving law-violation scenarios, eliminates hazardous actions in embodied agent tasks, and operates with millisecond-level overhead.” LLM-generated AgentSpec rules, which used OpenAI’s o1, also had a strong performance and enforced 87% of risky code and prevented “law-breaking in 5 out of 8 scenarios.”

Current methods are a little lacking

AgentSpec is not the only method to help developers bring more control and reliability to agents. Some of these approaches include ToolEmu and GuardAgent. The startup Galileo launched Agentic Evaluations, a way to ensure agents work as intended.

The open-source platform H2O.ai uses predictive models to make agents used by companies in the finance, healthcare, telecommunications and government more accurate.

The AgentSpec said researchers said current approaches to mitigate risks like ToolEmu effectively identify risks. They noted that “these methods lack interpretability and offer no mechanism for safety enforcement, making them susceptible to adversarial manipulation.”

Using AgentSpec

AgentSpec works as a runtime enforcement layer for agents. It intercepts the agent’s behavior while executing tasks and adds safety rules set by humans or generated by prompts.

Since AgentSpec is a custom domain-specific language, users need to define the safety rules. There are three components to this: the first is the trigger, which lays out when to activate the rule; the second is to check to add conditions and enforce which enforces actions to take if the rule is violated.

AgentSpec is built on LangChain, though, as previously stated, the researchers said AgentSpec can also be integrated into other frameworks like AutoGen or the autonomous vehicle software stack Apollo.

These frameworks orchestrate the steps agents need to take by taking in the user input, creating an execution plan, observing the result,s and then decides if the action was completed and if not, plans the next step. AgentSpec adds rule enforcement into this flow.

“Before an action is executed, AgentSpec evaluates predefined constraints to ensure compliance, modifying the agent’s behavior when necessary. Specifically, AgentSpec hooks into three key decision points: before an action is executed (AgentAction), after an action produces an observation (AgentStep), and when the agent completes its task (AgentFinish). These points provide a structured way to intervene without altering the core logic of the agent,” the paper states.

More reliable agents

Approaches like AgentSpec underscore the need for reliable agents for enterprise use. As organizations begin to plan their agentic strategy, tech decision leaders also look at ways to ensure reliability.

For many, agents will eventually autonomously and proactively do tasks for users. The idea of ambient agents, where AI agents and apps continuously run in the background and trigger themselves to execute actions, would require agents that do not stray from their path and accidentally introduce non-safe actions.

If ambient agents are where agentic AI will go in the future, expect more methods like AgentSpec to proliferate as companies seek to make AI agents continuously reliable.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Ubuntu namespace vulnerability should be addressed quickly: Expert

Thus, “there is little impact of not ‘patching’ the vulnerability,” he said. “Organizations using centralized configuration tools like Ansible may deploy these changes with regularly scheduled maintenance or reboot windows.” Features supposed to improve security Ironically, last October Ubuntu introduced AppArmor-based features to improve security by reducing the attack surface

Kyndryl extends Google Cloud partnership for AI-based mainframe modernization

“Many need to integrate mainframe data with the cloud and update applications to meet new security and compliance rules. Providers are using genAI-based code inspection to accelerate the discovery of code changes and modernization of legacy code,” ISG stated. “GenAI can inspect code and explain how it works, reducing and

Google Cloud partners with mLogica to offer mainframe modernization

Other than the partnership with mLogica, Google Cloud also offers a variety of other mainframe migration tools, including Radis and G4 that can be employed to modernize specific applications. Enterprises can also use a combination of migration tools to modernize their mainframe applications. Some of these tools include the Gemini-powered

Red Hat’s AI portfolio evolves to address enterprise deployment challenges

RHEL AI lets companies deploy workloads where they need them, including cloud, on premises, or on the edge, IDC analyst Michele Rosen says. “AI applications need to live as close as possible to their data,” she says. “For security, latency, and what have you. So you want to have a

Empire Petroleum Loss Widens for 2024 as EOR Activities Drag On

Empire Petroleum Corp. has posted a $16.2 million net loss for 2024, affected by operational challenges on the initial production optimization associated with enhanced oil recovery (EOR) efforts in the Starbuck drilling program in North Dakota. I had recorded a net loss of $12.5 million for 2023. Empire said in a media release that its total product revenue in 2024 reached $44 million, $4 million above 2023. The Starbuck drilling program helped the company increase oil sales volumes. Earnings before interest, taxes, depreciation and amortization (EBITDA) was $0.7 million for 2024, compared to negative $2.4 million in EBITDA for 2023. “As an emerging, agile company, Empire Petroleum has a unique ability to pivot quickly as we receive new data and insights. This flexibility is a tremendous advantage in the dynamic energy sector, allowing us to efficiently allocate capital and resources to the most promising opportunities where they will have the greatest impact”, chair Phil Mulacek said. Empire said that during the fourth quarter of 2024, it obtained authorization from the North Dakota Industrial Commission (NDIC) to transform two additional oil wells into injectors, further progressing its enhanced oil recovery strategy. The previous conversion of three wells resulted in a reduction of its short-term output but has improved prospects for long-term production growth, according to the company. In February 2025, Empire also said it had secured NDIC approval for five new drilling permits for horizontal wells. To contact the author, email [email protected] WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed. MORE FROM THIS AUTHOR

House Republicans probe EPA climate grant recipients

Dive Brief: Republicans on the House Oversight and Government Reform Committee have called on eight environmental organizations that received grants from the U.S. Environmental Protection Agency to submit documents, communications and other internal records related to the funding they received. The probe, announced Thursday, is spearheaded by Committee Chair James Comer and seeks to investigate the funding disbursed to recipients under the Biden administration’s EPA, including grants from the Greenhouse Gas Reduction Fund. The $27 billion fund was established as part of the 2022 Inflation Reduction Act, the former president’s landmark climate law. Comer said in his March 27 letters that the committee is “examining potential entanglements or conflicts of interests” between the environmental nonprofits and the Biden administration’s EPA. All eight groups received grants collectively worth $20 billion from the GGRF — a financial commitment Republicans have dubbed an “unprecedented arrangement.” Dive Insight: Comer sent letters to Climate United, Coalition for Green Capital, Power Forward Communities, Opportunity Finance Network, Inclusiv, Justice Climate Fund, Appalachian Community Capital and Native CDFI Network on Thursday. The letters requested that all eight groups provide information on employees, salaries and communication that took place between them and the former EPA administration related to GGRF grants by April 10. Earlier this month, the EPA — under Trump-nominated Lee Zeldin — froze access to the $20 billion grants committed through the GGRF, leaving recipients in a precarious financial situation. Climate United Fund, a grantee and one of the eight organizations that received a letter from Comer Thursday, said at the time the abrupt halt on funding impacted its ability to make payroll for employees and disburse funding to its contracted borrowers. In his March 3 letter to the EPA’s Acting Inspector General Nicole Murley, Zeldin said the structure of the IRA-backed fund was fraudulent and made

White paper points to carbon capture as possible data center solution

Dive Brief: Pairing natural gas generation with carbon capture could provide large data centers with another low-carbon option for reliable, off-grid electrical generation, according to a white paper from carbon management firm Carbon Direct. Integrating carbon capture with new gas-fired generation could deliver power at $70-100/MWh, putting it within the same range as other options for 24-7 low-carbon electricity, according to Carbon Direct. Large data centers are expanding their search for electrical supply that is not subject to interconnection delays, and carbon capture is one option they’ve considered, said Colin McCormick, a principal scientist at Carbon Direct. Dive Insight: Large AI data centers, some of which can draw 1 GW of electricity or more, need reliable 24-7 electrical supply, and their operators don’t want to wait for interconnection and permitting delays, and they prefer to minimize their carbon emissions, McCormick said. That makes these operators a market force that could put carbon capture on the map, he said. Following inquiries from clients looking to build new data centers, Carbon Direct crunched the numbers on the cost of behind-the-meter gas generation with carbon capture, McCormick said. The company’s clients were intrigued by the potential time to deployment — new gas generation can be brought online in less than 24 months in some cases, he said. They were less familiar with carbon capture and its limitations — and there will be some limitations, McCormick said, particularly around the storage and transportation of waste carbon in regions where the appropriate pipelines don’t exist, but the final numbers look promising. The 45Q tax credits created by the Inflation Reduction Act for carbon capture and sequestration bring the cost of integrating such systems into new gas plants down to reasonable levels, McCormick said, potentially bringing the cost of generation from a new plant with carbon capture

US ‘nuclear renaissance’ faces high capex costs, uncertain federal policy support: ICF

Dive Brief: Nuclear power’s long-term role in the U.S. electricity generation mix depends on still-unresolved questions about plant economics, technology selection, federal policy and financial support, fuel availability, waste disposal and public attitudes, analysts for strategic consultancy ICF said in a whitepaper released Thursday. Despite interest from utilities and large electricity consumers that has led to multiple planned reactor restarts and several gigawatt-scale partnerships between nuclear companies and offtakers, the industry faces “significant challenges and limitations that will determine whether it will expand its role as a core technology underpinning the U.S. energy system,” the authors said. Despite higher revenue potential than most competing technologies, capital costs for new nuclear plants could range from $456/kW-year to $863/kW-year, significantly above those for wind, solar, gas combustion turbines and 4-hour battery energy storage systems, they said. Dive Insight: Since 2023, the owners of three recently retired nuclear power plants — the 800-MW Palisades plant in Michigan, 835-MW Crane Clean Energy Center in Pennsylvania and 601-MW Duane Arnold plant in Iowa — have announced plans to resume operations. The Pennsylvania restart is supported by a 20-year power purchase agreement between Microsoft and Constellation Energy. Meanwhile, “hyperscale” tech companies like Amazon, Meta and Google have inked tentative partnerships with advanced nuclear technology developers. Data center operator Switch and steelmaker Nucor have shown interest in advanced nuclear as well. In December, Switch and Oklo announced a 20-year, 12-GW nonbinding “master power agreement” through which the advanced nuclear company would develop, build and operate nuclear plants to power Switch facilities. “Should these new investments in nuclear energy come to pass, they could make a dent in the growing electricity demand. But a nuclear renaissance isn’t written in stone,” whitepaper authors Ian Bowen, Dino Vivanco, Vinay Gupta, George Katsigiannakis and Shanthi Muthiah said. Technology selection and design

Powering the future: Utility leaders’ take on program challenges and opportunities

Justin Mackovyak is vice president, utility program implementation at ICF International. The electric utility industry is facing a perfect storm. Billion-dollar weather events and unprecedented energy demand from AI and electrification are forcing utilities to rethink their strategies and seize opportunities. With this in mind, ICF surveyed 100 utility program leaders to explore their challenges with capacity, plans for electrification, and technology adoption. The takeaway is clear: programs must be retooled to fully support utilities’ missions of delivering clean, reliable and affordable energy. Customer programs: The challenges and opportunities Utility customer programs are facing a moment of great change and opportunity. They need to be more flexible and predictable in the face of rapid demand growth, electrification, aging infrastructure and new technology. Right now, utility leaders view grid modernization (57%), affordability (51%) and enhanced reliability (49%) as their top goals to achieve through customer programs. Nearly half of respondents (45%) view decarbonization as a top goal. Customer support is essential to meeting these goals, and these programs must be affordable to secure customer buy-in. Respondents agreed that customer programs must include financing options to effectively promote the adoption of reliable, clean energy. That makes affordability a priority for utility program leaders. However, 93% of them admit that current support falls short. There is more work to be done to prioritize affordability and make customer participation more accessible. Utilities recognize the opportunity to strengthen their outreach to ensure all communities benefit from their programs and contribute to broader energy goals. To bridge this gap, they must work to enhance communication with their customers by relying on responsiveness and human connection to build trust. Those qualities should be at the forefront of designed outreach around new programs and financing options. Meeting growing demand AI, data centers, electric vehicles, heat pumps and other

Xlinks warns UK solar and cable project could move abroad

A lack of commitment from the UK government for a major UK-Morocco power project could jeopardise a planned Scottish cable factory, XLinks chairman Dave Lewis has warned. In comments to the Telegraph, he said that slow progress from authorities could push the £25 billion project overseas, potentially to Germany. “The people who have invested in this project want it to go ahead in the UK,” he said. “We think that’s by far and away the best use of this energy, but there comes a point where you go, ‘OK, we’re four years in. We’ve done everything that you asked us to do, but this process is taking an enormous amount of time.’” Xlinks is developing an ambitious project to produce renewable energy in Morocco and export it to the UK. This would see 11.5 GW of solar and wind capacity built in Morocco and then sent to the UK via 2,300 miles (3,800km) of subsea cables. © Supplied by XLCCRenderings of the future XLCC cable manufacturing site at Hunterston in Ayrshire. The XLinks project has lined up multiple backers including TotalEnergies, Octopus Energy, Abu Dhabi’s TAQA, GE Vernova and the Africa Finance Corporation (AFC). With £8bn of funding needed, Lewis said that “there are people lining up and down the street” to provide it, while recent tests to secure financing for the remaining £17bn were “significantly oversubscribed”. The company’s affiliate, XLCC, has partnered with HALO Kilmarnock as it looks to create the UK’s first high-voltage, direct current (HVDC) cable factory in Hunterston, Ayrshire The £1.4bn project won planning permission in 2022. Not only will it renovate the disused Peel Ports coal yard it has the potential to create 1,200 jobs, with 300 of these being in Kilmarnock. The facility is anchored by an order to supply Xlinks with the massive

Executive Roundtable: Cooling Imperatives for Managing High-Density AI Workloads

Michael Lahoud, Stream Data Centers: For the past two years, Stream Data Centers has been developing a modular, configurable air and liquid cooling system that can handle the highest densities in both mediums. Based on our collaboration with customers, we see a future that still requires both cooling mediums, but with the flexibility to deploy either type as the IT stack destined for that space demands. With this necessity as a backdrop, we saw a need to develop a scalable mix-and-match front-end thermal solution that gives us the ability to late bind the equipment we need to meet our customers’ changing cooling needs. It’s well understood that liquid far outperforms air in its ability to transport heat, but further to this, with the right IT configuration, cooling fluid temperatures can also be raised, and this affords operators the ability to use economization for a greater number of hours a year. These key properties can help reduce the energy needed for the mechanical part of a data center’s operations substantially. It should also be noted that as servers are redesigned for liquid cooling and the onboard server fans get removed or reduced in quantity, more of the critical power delivered to the server is being used for compute. This means that liquid cooling also drives an improvement in overall compute productivity despite not being noted in facility PUE metrics. Counter to air cooling, liquid cooling certainly has some added management challenges related to fluid cleanliness, concurrent maintainability and resiliency/redundancy, but once those are accounted for, the clusters become stable, efficient and more sustainable with improved overall productivity.

Airtel connects India with 100Tbps submarine cable

“Businesses are becoming increasingly global and digital-first, with industries such as financial services, data centers, and social media platforms relying heavily on real-time, uninterrupted data flow,” Sinha added. The 2Africa Pearls submarine cable system spans 45,000 kilometers, involving a consortium of global telecommunications leaders including Bayobab, China Mobile International, Meta, Orange, Telecom Egypt, Vodafone Group, and WIOCC. Alcatel Submarine Networks is responsible for the cable’s manufacturing and installation, the statement added. This cable system is part of a broader global effort to enhance international digital connectivity. Unlike traditional telecommunications infrastructure, the 2Africa Pearls project represents a collaborative approach to solving complex global communication challenges. “The 100 Tbps capacity of the 2Africa Pearls cable significantly surpasses most existing submarine cable systems, positioning India as a key hub for high-speed connectivity between Africa, Europe, and Asia,” said Prabhu Ram, VP for Industry Research Group at CyberMedia Research. According to Sinha, Airtel’s infrastructure now spans “over 400,000 route kilometers across 34+ cables, connecting 50 countries across five continents. This expansive infrastructure ensures businesses and individuals stay seamlessly connected, wherever they are.” Gogia further emphasizes the broader implications, noting, “What also stands out is the partnership behind this — Airtel working with Meta and center3 signals a broader shift. India is no longer just a consumer of global connectivity. We’re finally shaping the routes, not just using them.”

Former Arista COO launches NextHop AI for customized networking infrastructure

Sadana argued that unlike traditional networking where an IT person can just plug a cable into a port and it works, AI networking requires intricate, custom solutions. The core challenge is creating highly optimized, efficient networking infrastructure that can support massive AI compute clusters with minimal inefficiencies. How NextHop is looking to change the game for hyperscale networking NextHop AI is working directly alongside its hyperscaler customers to develop and build customized networking solutions. “We are here to build the most efficient AI networking solutions that are out there,” Sadana said. More specifically, Sadana said that NextHop is looking to help hyperscalers in several ways including: Compressing product development cycles: “Companies that are doing things on their own can compress their product development cycle by six to 12 months when they partner with us,” he said. Exploring multiple technological alternatives: Sadana noted that hyperscalers might try and build on their own and will often only be able to explore one or two alternative approaches. With NextHop, Sadana said his company will enable them to explore four to six different alternatives. Achieving incremental efficiency gains: At the massive cloud scale that hyperscalers operate, even an incremental one percent improvement can have an oversized outcome. “You have to make AI clusters as efficient as possible for the world to use all the AI applications at the right cost structure, at the right economics, for this to be successful,” Sadana said. “So we are participating by making that infrastructure layer a lot more efficient for cloud customers, or the hyperscalers, which, in turn, of course, gives the benefits to all of these software companies trying to run AI applications in these cloud companies.” Technical innovations: Beyond traditional networking In terms of what the company is actually building now, NextHop is developing specialized network switches

Microsoft abandons data center projects as OpenAI considers its own, hinting at a market shift

A potential ‘oversupply position’ In a new research note, TD Cowan analysts reportedly said that Microsoft has walked away from new data center projects in the US and Europe, purportedly due to an oversupply of compute clusters that power AI. This follows reports from TD Cowen in February that Microsoft had “cancelled leases in the US totaling a couple of hundred megawatts” of data center capacity. The researchers noted that the company’s pullback was a sign of it “potentially being in an oversupply position,” with demand forecasts lowered. OpenAI, for its part, has reportedly discussed purchasing billions of dollars’ worth of data storage hardware and software to increase its computing power and decrease its reliance on hyperscalers. This fits with its planned Stargate Project, a $500 billion, US President Donald Trump-endorsed initiative to build out its AI infrastructure in the US over the next four years. Based on the easing of exclusivity between the two companies, analysts say these moves aren’t surprising. “When looking at storage in the cloud — especially as it relates to use in AI — it is incredibly expensive,” said Matt Kimball, VP and principal analyst for data center compute and storage at Moor Insights & Strategy. “Those expenses climb even higher as the volume of storage and movement of data grows,” he pointed out. “It is only smart for any business to perform a cost analysis of whether storage is better managed in the cloud or on-prem, and moving forward in a direction that delivers the best performance, best security, and best operational efficiency at the lowest cost.”

PEAK:AIO adds power, density to AI storage server

There is also the fact that many people working with AI are not IT professionals, such as professors, biochemists, scientists, doctors, clinicians, and they don’t have a traditional enterprise department or a data center. “It’s run by people that wouldn’t really know, nor want to know, what storage is,” he said. While the new AI Data Server is a Dell design, PEAK:AIO has worked with Lenovo, Supermicro, and HPE as well as Dell over the past four years, offering to convert their off the shelf storage servers into hyper fast, very AI-specific, cheap, specific storage servers that work with all the protocols at Nvidia, like NVLink, along with NFS and NVMe over Fabric. It also greatly increased storage capacity by going with 61TB drives from Solidigm. SSDs from the major server vendors typically maxed out at 15TB, according to the vendor. PEAK:AIO competes with VAST, WekaIO, NetApp, Pure Storage and many others in the growing AI workload storage arena. PEAK:AIO’s AI Data Server is available now.

SoftBank to buy Ampere for $6.5B, fueling Arm-based server market competition

SoftBank’s announcement suggests Ampere will collaborate with other SBG companies, potentially creating a powerful ecosystem of Arm-based computing solutions. This collaboration could extend to SoftBank’s numerous portfolio companies, including Korean/Japanese web giant LY Corp, ByteDance (TikTok’s parent company), and various AI startups. If SoftBank successfully steers its portfolio companies toward Ampere processors, it could accelerate the shift away from x86 architecture in data centers worldwide. Questions remain about Arm’s server strategy The acquisition, however, raises questions about how SoftBank will balance its investments in both Arm and Ampere, given their potentially competing server CPU strategies. Arm’s recent move to design and sell its own server processors to Meta signaled a major strategic shift that already put it in direct competition with its own customers, including Qualcomm and Nvidia. “In technology licensing where an entity is both provider and competitor, boundaries are typically well-defined without special preferences beyond potential first-mover advantages,” Kawoosa explained. “Arm will likely continue making independent licensing decisions that serve its broader interests rather than favoring Ampere, as the company can’t risk alienating its established high-volume customers.” Industry analysts speculate that SoftBank might position Arm to focus on custom designs for hyperscale customers while allowing Ampere to dominate the market for more standardized server processors. Alternatively, the two companies could be merged or realigned to present a unified strategy against incumbents Intel and AMD. “While Arm currently dominates processor architecture, particularly for energy-efficient designs, the landscape isn’t static,” Kawoosa added. “The semiconductor industry is approaching a potential inflection point, and we may witness fundamental disruptions in the next 3-5 years — similar to how OpenAI transformed the AI landscape. SoftBank appears to be maximizing its Arm investments while preparing for this coming paradigm shift in processor architecture.”

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE