Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning

Stay Ahead, Stay ONMINE

Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning

For robots to be truly helpful in our daily lives and industries, they must do more than follow instructions, they must reason about the physical world. From navigating a complex facility to interpreting the needle on a pressure gauge, a robot’s “embodied reasoning” is what allows it to bridge the gap between digital intelligence and physical action.Today, we’re introducing Gemini Robotics-ER 1.6, a significant upgrade to our reasoning-first model that enables robots to understand their environments with unprecedented precision. By enhancing spatial reasoning and multi-view understanding, we are bringing a new level of autonomy to the next generation of physical agents.This model specializes in reasoning capabilities critical for robotics, including visual and spatial understanding, task planning and success detection. It acts as the high-level reasoning model for a robot, capable of executing tasks by natively calling tools like Google Search to find information, vision-language-action models (VLAs) or any other third-party user-defined functions.Gemini Robotics-ER 1.6 shows significant improvement over both Gemini Robotics-ER 1.5 and Gemini 3.0 Flash, specifically enhancing spatial and physical reasoning capabilities such as pointing, counting, and success detection. We are also unlocking a new capability: instrument reading, enabling robots to read complex gauges and sight glasses — a use case we discovered through close collaboration with our partner, Boston Dynamics.Starting today, Gemini Robotics-ER 1.6 is available to developers via the Gemini API and Google AI Studio. To help you get started, we are sharing a developer Colab containing examples of how to configure the model and prompt it for embodied reasoning tasks.

Today, we’re introducing Gemini Robotics-ER 1.6, a significant upgrade to our reasoning-first model that enables robots to understand their environments with unprecedented precision. By enhancing spatial reasoning and multi-view understanding, we are bringing a new level of autonomy to the next generation of physical agents.

This model specializes in reasoning capabilities critical for robotics, including visual and spatial understanding, task planning and success detection. It acts as the high-level reasoning model for a robot, capable of executing tasks by natively calling tools like Google Search to find information, vision-language-action models (VLAs) or any other third-party user-defined functions.

Gemini Robotics-ER 1.6 shows significant improvement over both Gemini Robotics-ER 1.5 and Gemini 3.0 Flash, specifically enhancing spatial and physical reasoning capabilities such as pointing, counting, and success detection. We are also unlocking a new capability: instrument reading, enabling robots to read complex gauges and sight glasses — a use case we discovered through close collaboration with our partner, Boston Dynamics.

Starting today, Gemini Robotics-ER 1.6 is available to developers via the Gemini API and Google AI Studio. To help you get started, we are sharing a developer Colab containing examples of how to configure the model and prompt it for embodied reasoning tasks.

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

AI workloads shake up observability market

There are 19 vendors that made the cut for Gartner’s new report. Its Leaders quadrant includes (alphabetically) Chronosphere, Coralogix, Datadog, Dynatrace, Elastic, Grafana Labs, IBM, and New Relic. The Challengers are Alibaba Cloud, Amazon Web Services, LogicMonitor, Microsoft, and Splunk. The two Visionaries are BMC Helix and Honeycomb. Those dubbed

Huawei eying possible DRAM market entry

Chinese tech giant Huawei is reportedly entering the DRAM manufacturing business in a bid to cash in on the insane profitability of memory sales. Three firms – Micron Technology, SK hynix, and Samsung Electronics — account for 95% of the DRAM on the market worldwide. The rest is small players,

IBM targets AI edge with Power server, software upgrades

IBM has bolstered its Power server portfolio with a new edge S1112 server and announced IBM Power Autonomous Operations, an AI agent that helps customers monitor Power systems and autonomously resolve issues to keep operations running smoothly. Additional software upgrades are aimed at helping customers deploy and manage AI infrastructure

Fortinet adds AI protections to endpoint security platform

“FortiEndpoint provides centralized visibility into AI applications and agents operating across managed endpoints. Security teams can identify sanctioned and unsanctioned tools, detect shadow AI, monitor adoption trends, and understand user activity through unified dashboards,” wrote Ankit Gupta, product and marketing leader for Fortinet, in a blog post about the enhancements.

Energy Secretary Secures Mid-Atlantic Grid Ahead of Period of Hot Weather

WASHINGTON—The U.S. Department of Energy (DOE) today issued an emergency order to mitigate blackout risks in the Mid-Atlantic ahead of the forecasted hot weather conditions and expected system load increase. The order directs PJM Interconnection, L.L.C. (PJM) to dispatch specified units and to order their operation as needed to maintain reliability. The order also authorizes PJM to direct backup generation resources to operate as a last resort before declaring an Energy Emergency Alert (EEA) 3 or during an EEA 3. PJM is authorized to call upon its Transmission Owners and Electric Distribution Companies to implement the order as needed. The order was issued pursuant to an application from PJM submitted on July 13, 2026. “Maintaining affordable, reliable, and secure power in the PJM service territory is non-negotiable,” said U.S. Secretary of Energy Chris Wright. “The previous administration’s energy subtraction policies weakened the grid, leaving Americans more vulnerable during events like this. Thanks to President Trump’s leadership, we are reversing those failures and using every available tool ensuring Americans in the Mid-Atlantic have continued access to affordable, reliable, and secure energy to power and cool their homes.” DOE estimates more than 35 GW of unused backup generation remains available nationwide. On day one, President Trump declared a national energy emergency after the Biden administration’s energy subtraction agenda left behind a grid increasingly vulnerable to risks of blackouts. According to the North American Electric Reliability Corporation’s (NERC) 2026 Summer Reliability Assessment, the peak electricity demand in PJM occurs during the summer season. NERC further notes that “if extreme high temperatures are experienced, PJM anticipates the need for demand-response resources to help reduce load.” Power outages cost the American people $44 billion per year, according to data from DOE’s National Laboratories. This order will mitigate the possibility of power outages in the Mid-Atlantic and highlights the common sense policies of the Trump Administration to ensure Americans have access to affordable, reliable,

DOE Alternative Fuels and Feedstocks Office Announces Intent to Advance Innovative Chemical Technologies

Proposed DOE funding will accelerate domestic chemical production WASHINGTON—The U.S. Department of Energy’s (DOE) Alternative Fuels and Feedstocks Office (AFFO) today announced its intent to fund the advancement of novel, high-impact chemical technologies. The proposed funding opportunity, Accelerating Scale-up and Pre-piloting of Emerging Chemical Technologies (ASPECT), will advance technologies for producing chemicals from alternative and waste feedstocks. In accordance with President Trump’s Executive Order Unleashing American Energy, this funding opportunity will strengthen domestic supply chains, reduce reliance on foreign imports, and accelerate American technology innovation. More than 96% of manufactured goods rely on products from the U.S. chemical sector, which directly employs more than half a million Americans. The ASPECT funding opportunity will reinforce domestic manufacturing and chemical supply chains by expanding the use of alternative feedstocks. Funding provided through ASPECT will target chemical technologies that improve performance, reduce costs, and show large market growth potential. AFFO expects to issue a notice of funding opportunity (NOFO) in August 2026, making up to $58 million available for projects that address the following topic areas: Topic Area 1: Bench ASPECT – to support the development and adoption of new technologies for producing chemicals from alternative feedstocks, moving beyond proof-of-concept to bench and pre-pilot scale. Topic Area 2: Pre-pilot ASPECT – to accelerate the development and market entry of strategically valuable, domestically produced chemicals. Following the NOFO announcement, AFFO will host an informational webinar to discuss a new, streamlined application and review process. Learn more about the topic areas, applicant eligibility and registration requirements, and the Teaming Partner List. Visit DOE eXCHANGE to view the full NOI.

PTTEP achieves Thailand’s first wellhead platform reuse in Gulf of Thailand

PTT Exploration and Production Public Co. Ltd. (PTTEP) has completed Thailand’s first total wellhead platform reuse project by redeploying an entire decommissioned petroleum wellhead platform as a complete structure in Funan field in the Gulf of Thailand. The reuse project comes as part of PTTEP’s program to maximize value and extend utilization of wellhead platforms that remain structurally sound and safe after depleting resources at a location by redeploying the platform as a complete structure. The first implementation was carried out at the Jakrawan K wellhead platform (JKWK), in Funan field under the G1/61 Project. As part of the project, PTTEP adopted the wet-tow method to relocate the jacket, helping curb energy consumption and minimize impacts on marine life attached to the platform structure, supporting a balance between energy production and marine environmental stewardship. The topside, jacket, and selected pile sections were relocated and reinstalled for use within the same field, reducing the overall construction and installation period to only 6 months, down from about 20 months for a newly built platform. Additionally, the approach cut construction costs by about 35–50% compared with construction of an entirely new wellhead platform. PTTEP said it expects the initiative to also reduce greenhouse gas emissions by about 3,270 tonnes of CO2e/platform by limiting the use of steel and other equipment required for construction of new platforms. PTTEP is operator of the G1/61 project (60%) with partner Mubadala Investment Co. (40%).

Trump declares Iran ceasefire over; oil surges on renewed supply risk

US President Donald Trump said the ceasefire and memorandum of understanding (MOU) reached with Iran last month is effectively over following a fresh exchange of strikes, reigniting supply concerns and sending crude prices sharply higher. Speaking alongside NATO Secretary-General Mark Rutte at the alliance’s summit in Ankara, Pres. Trump said Washington no longer sees value in maintaining the ceasefire framework with Tehran, though he left open the possibility of continued talks. He added that further US military action against Iran remains likely after strikes overnight. Stay updated on oil price volatility, shipping disruptions, LNG market analysis, and production output at OGJ’s Iran war content hub. The escalation was triggered by alleged Iranian attacks on three commercial vessels transiting the Strait of Hormuz on July 7. US Central Command said it responded with strikes on more than 80 Iranian targets, including air defense systems, command-and-control infrastructure, anti-ship missile capabilities, and over 60 Islamic Revolutionary Guard Corps (IRGC) fast boats operating in and near the strait. US Central Command described the tanker attacks as a clear violation of the June 17 agreement. Iran’s Foreign Ministry called the US strikes a breach of the MOU and said Tehran would continue to defend its sovereignty. The IRGC said it retaliated with drone and missile strikes targeting US military facilities in Bahrain and Kuwait. Authorities in both countries reported intercepting incoming projectiles, with no material damage confirmed. Trump said on July 8 the US is considering reinstating a naval blockade targeting Iranian ports and vessels. He also raised the possibility of strikes on civilian infrastructure, including electric plants and desalination facilities, as well as a potential move to take control of Kharg Island, home to the bulk of Iran’s crude export infrastructure. He said Tuesday’s strikes had reached the island but had not targeted its

US EIA forecasts declining oil prices as supply disruptions ease

In its July 7 Short-Term Energy Outlook (STEO) report, the US Energy Information Administration (EIA) said it expects global oil prices to decline as supply disruptions linked to the Strait of Hormuz ease and production recovers. On June 18, the US and Iran signed a memorandum of understanding to end the conflict and reopen the strait, which had been largely closed since Feb. 28. The disruption to this critical oil transit chokepoint constrained global flows, driving major price volatility. Brent crude averaged $85/bbl in June, down $22/bbl from May and $32/bbl below its April peak. Prices fell below $70/bbl on July 1 as tanker traffic through the strait increased sharply, easing supply concerns. EIA now expects most shut-in crude production to return to near pre-conflict levels by yearend, with full restoration largely to be completed by first-quarter 2027. Despite the recovery in flows, global inventories remain significantly depleted following earlier draws. EIA estimates oil inventories declined by an average of 5.1 million b/d in second-quarter 2026 and will fall by a further 2.2 million b/d in third-quarter 2026, as much of the recent tanker movement reflects previously stranded cargoes. As a result, the market is expected to remain relatively tight through most of third-quarter 2026 before shifting back into oversupply. EIA forecasts global oil consumption will decline by 1.2 million b/d in 2026, led by a 0.8 million b/d drop in non-OECD demand, particularly in the Asia Pacific. Demand is expected to rebound in 2027 as prices ease and supply normalizes, with consumption rising by 2.0 million b/d to 104.8 million b/d. As supply growth outpaces demand, inventories are projected to build by 2.7 million b/d in fourth-quarter 2026 and by 5.0 million b/d in 2027. This shift is expected to place sustained downward pressure on prices. EIA forecasts Brent

Eni lets EPCI contract for Kutei North Hub field FPSO

Eni North Ganal has let an engineering, procurement, construction, and installation (EPCI) contract to a joint venture between PT Saipem Indonesia and PT Tripatra Engineers and Constructors for a floating production, storage, and offloading (FPSO) unit for the Kutei North Hub Field Development Project in Kutei basin, offshore Indonesia, about 70 km off East Kalimantan. The project execution, with an estimated duration of 48 months, includes project management, engineering, procurement of materials, fabrication, construction and installation activities, as well as commissioning and start-up of the FPSO unit. The contract is valued at about $2 billion for Saipem’s share. The Kutei FPSO project is part of the Kutei North Hub Development, which comprises a subsea development tied back to the new FPSO, a dedicated gas export pipeline to the Bontang LNG plant, and domestic gas users via the existing East Kalimantan System. Eni North Ganal is controlled by Searah Ltd., which was formed through a strategic partnership between Eni and Petronas.

The AI Infrastructure Split Screen: Capital Rush Meets Community Resistance

It would be difficult to construct a more revealing snapshot of the AI infrastructure market than the one delivered in mid-July. In the same news cycle, Csquare completed a billion-dollar initial public offering, Switch was linked to a potential $10 billion IPO, and Databricks reached a reported valuation of $188 billion. At the project level, developers advanced or disclosed campuses measured not in tens or hundreds of megawatts, but in gigawatts—from Meta’s expanding Louisiana complex and Google’s reported Wyoming plans to new Crusoe, QTS, MARA and Tract developments. Yet the same week brought a state-level permitting pause in New York, a decisive project rejection in Palm Beach County, planned protests across more than 20 states, and fresh disputes over parkland, water availability and local control. This is the data center and AI landscape in 2026: capital is abundant but increasingly discriminating; power is more valuable than the underlying real estate; and community consent has become nearly as important as interconnection capacity. Public Markets Put Different Prices on the AI Stack The capital-market headlines illustrated how differently investors are valuing the various layers of AI infrastructure. Csquare priced 50 million shares at $21, raising approximately $1.05 billion and establishing an equity valuation of roughly $3.2 billion. The offering was substantial, but it priced below the proposed $23-to-$27 range, and the shares finished their first trading day slightly below the offer price. Brookfield retained approximately 67% of the company’s voting power following the transaction. That reception contrasts sharply with the valuation being discussed for Switch. The DigitalBridge-backed operator has reportedly engaged Goldman Sachs and JPMorgan for a potential IPO that could raise as much as $10 billion and value Switch near $80 billion, including debt. The transaction remains prospective, but the figure is striking when compared with the $11 billion take-private agreement

New York State just hit pause on the AI data center boom

The moratorium could result in some “border-hopping,” with enterprises hosting local servers in adjacent states like Pennsylvania, Connecticut, or New Jersey, but that’s not likely to be widespread, Kimball noted. The realistic regional impact will be “more of a slow squeeze rather than a shock,” he said. This could result in tighter colocation availability and firmer pricing in the New York Metropolitan area over the next few years. Cloud providers may also steer new AI capacity to regions like Georgia, Ohio, Texas, and Utah, where power and permitting are more predictable. An inflection point, but more trickle-down than direct impact Indeed, noted Jeremy Roberts, senior director for research and content at Info-Tech Research Group, the moratorium is an “inflection point” and a “way to placate an increasingly angry public,”.

TeraWulf’s $19B Anthropic Lease Puts Its Brownfield AI Strategy to the Test

He added that the company’s strategy is centered on owning and operating critical infrastructure, maintaining direct relationships with customers and controlling the long-term evolution of its campuses. This Model Differs Significantly from the Previous Abernathy JV TeraWulf and Fluidstack created the Abernathy venture in 2025 to develop a 168-MW critical IT load campus on approximately 120 acres near Abernathy, Texas. The project’s total utility requirement has been described as approximately 240 MW. Fluidstack committed to a 25-year lease at the campus, with Google providing approximately $1.3 billion of credit support for Fluidstack’s obligations. TeraWulf acquired a 50.1% interest in the joint venture through an investment of approximately $450 million. The project subsequently issued $1.3 billion in senior secured notes to support construction and related expenses. The Abernathy agreements were expected to produce approximately $9.5 billion in contracted revenue for the joint venture over the initial 25-year term. Construction has been advancing toward delivery during the second half of 2026. Following the sale, Fluidstack and the other purchasers will control the project. TeraWulf agreed to sell its Abernathy interest for approximately $530 million, compared with its $450 million investment in the joint venture. The consideration is scheduled to be paid in three installments through April 2027, with the proceeds expected to support investment in infrastructure opportunities that TeraWulf intends to own and operate directly. The decision does not necessarily indicate that TeraWulf has become less interested in partnerships with Fluidstack. Fluidstack remains an important tenant at TeraWulf’s Lake Mariner campus in New York, and the companies have built a substantial pipeline of AI infrastructure together. In infrastructure terms, TeraWulf is acting as both developer and capital allocator. It originated the Abernathy project, helped secure the customer and financing structure, advanced construction and is now monetizing its interest before the campus begins

Comparing Space-Driven Data Center Strategies: Modular Satellites vs. Integrated Rocket Nodes

In addition to developing radiation-tolerant computing, optical communications, deployable solar arrays and orbital thermal-management systems, Cowboy must successfully design, manufacture, test and license a new rocket. Its launch vehicle would require authorization from the Federal Aviation Administration in addition to the approvals needed for the satellite constellation. Cowboy nevertheless enters the race with considerably more capital than Orbital. The company announced a $275 million Series B round in May at a reported $2 billion valuation. Founded in 2024 by Robinhood co-founder Baiju Bhatt, with a focus on space-based solar power before expanding into orbital computing and launch systems. One Hundred Kilowatts Versus One Megawatt The clearest distinction between the two proposals is the capacity assigned to each node. Orbital’s production design calls for approximately 100 kilowatts of computing power per satellite. Cowboy is targeting megawatt-class spacecraft, potentially giving each Stampede node approximately 10 times the power capacity of an Orbital satellite. At their stated maximum scales, Orbital’s 100,000 satellites would provide approximately 10 gigawatts. If Cowboy ultimately achieved one megawatt across all 20,000 Stampede spacecraft, its theoretical aggregate capacity would approach 20 gigawatts. Those figures should be treated as design objectives, not capacity forecasts. Neither company has demonstrated even one operational node at its proposed production power level. Orbital’s smaller satellites may be easier to test and deploy incrementally. The company can begin with a single hosted GPU, progress to a purpose-built prototype and expand as launch economics and customer demand permit. Cowboy’s larger nodes could provide more useful computing capacity with fewer satellites and potentially fewer launches. Combining the rocket stage and data center would also reduce the amount of structural mass that does not directly support power generation or computing. The tradeoff is concentration risk. The failure of a megawatt Cowboy spacecraft would remove considerably more capacity than

Google Cloud configuration update disrupts VMware Engine stretched clusters

“Google made a network setting change that accidentally broke the connection between the two data center zones in VMware Engine. The virtual machines themselves kept running fine, but nobody could reach them, and there was a risk that some machines might lose the ability to save data properly. This indicates that even managed cloud infrastructure can experience failures in critical shared network components,” said Pareekh Jain, CEO at EIIRTrend & Pareekh Consulting. Neil Shah, vice president at Counterpoint Research, said the real culprit here is the SDN orchestration control plane, where a routine internal network update or configuration tweak introduced routing failure across multiple zones. “While most of the physical nodes are distributed for exactly this redundancy purpose, they are still tightly coupled to a singular shared orchestration fabric, so if that control plane crashes, then everything comes crashing down, and the physical distributed nodes become irrelevant.” Stretched clusters fall short Although the outage did not bring down virtual machines, the incident undermined the primary reason enterprises deploy stretched clusters.

AI’s Future Must Return to the Edge: How Power Constraints and Local Politics Are Redefining AI Infrastructure

Over the past two years, AI build plans have driven a sharp escalation in projected data center power demand. One recent assessment1 found that the U.S. disclosed data center development pipeline reached roughly 241 gigawatts by the end of 2025—an increase of about 159% in a single year—illustrating the unprecedented pace at which AI infrastructure demand is expanding. Forecasts from major analysts indicate that total data center power consumption could grow at least 50% by 2027 and potentially as much as 165% by 2030, with AI training and inference responsible for most of the incremental load.2 At this pace, planned AI capacity is growing faster than electric infrastructure can realistically be expanded. In many markets, available land and fiber are not the limiting factors; dependable megawatt delivery is.3 At the facility level, AI hardware is moving standard designs into new ranges. Power densities that once centered around 10–20 kW per rack are being replaced by configurations nearer 40 kW, with dense AI racks pushing toward 85 kW today and credible roadmaps to 200–250 kW per rack by 2030, though we’ve all seen the reports of even larger. These levels do not only affect cooling and white‑space layouts; they materially change the electrical infrastructure required per room and per building, and by extension the strain on local grids. On the power‑system side, constraints are now explicit. Transmission operators and regulators are stating that current generation, interconnection, and build‑out timelines are not sufficient to accommodate another decade of large demand centers in their present form. Analysts tracking AI data center energy demand point to electricity, grid access, and firm capacity as the primary constraints on new builds, with grid bottlenecks and transmission limitations flagged as risks for up to 20% of planned projects.4, 5 At the facility level, AI hardware is moving

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE