Stay Ahead, Stay ONMINE

Five ways that AI is learning to improve itself

Last week, Mark Zuckerberg declared that Meta is aiming to achieve smarter-than-human AI. He seems to have a recipe for achieving that goal, and the first ingredient is human talent: Zuckerberg has reportedly tried to lure top researchers to Meta Superintelligence Labs with nine-figure offers. The second ingredient is AI itself.  Zuckerberg recently said on an earnings call that Meta Superintelligence Labs will be focused on building self-improving AI—systems that can bootstrap themselves to higher and higher levels of performance. The possibility of self-improvement distinguishes AI from other revolutionary technologies. CRISPR can’t improve its own targeting of DNA sequences, and fusion reactors can’t figure out how to make the technology commercially viable. But LLMs can optimize the computer chips they run on, train other LLMs cheaply and efficiently, and perhaps even come up with original ideas for AI research. And they’ve already made some progress in all these domains. According to Zuckerberg, AI self-improvement could bring about a world in which humans are liberated from workaday drudgery and can pursue their highest goals with the support of brilliant, hypereffective artificial companions. But self-improvement also creates a fundamental risk, according to Chris Painter, the policy director at the AI research nonprofit METR. If AI accelerates the development of its own capabilities, he says, it could rapidly get better at hacking, designing weapons, and manipulating people. Some researchers even speculate that this positive feedback cycle could lead to an “intelligence explosion,” in which AI rapidly launches itself far beyond the level of human capabilities. But you don’t have to be a doomer to take the implications of self-improving AI seriously. OpenAI, Anthropic, and Google all include references to automated AI research in their AI safety frameworks, alongside more familiar risk categories such as chemical weapons and cybersecurity. “I think this is the fastest path to powerful AI,” says Jeff Clune, a professor of computer science at the University of British Columbia and senior research advisor at Google DeepMind. “It’s probably the most important thing we should be thinking about.” By the same token, Clune says, automating AI research and development could have enormous upsides. On our own, we humans might not be able to think up the innovations and improvements that will allow AI to one day tackle prodigious problems like cancer and climate change. For now, human ingenuity is still the primary engine of AI advancement; otherwise, Meta would hardly have made such exorbitant offers to attract researchers to its superintelligence lab. But AI is already contributing to its own development, and it’s set to take even more of a role in the years to come. Here are five ways that AI is making itself better. 1. Enhancing productivity Today, the most important contribution that LLMs make to AI development may also be the most banal. “The biggest thing is coding assistance,” says Tom Davidson, a senior research fellow at Forethought, an AI research nonprofit. Tools that help engineers write software more quickly, such as Claude Code and Cursor, appear popular across the AI industry: Google CEO Sundar Pichai claimed in October 2024 that a quarter of the company’s new code was generated by AI, and Anthropic recently documented a wide variety of ways that its employees use Claude Code. If engineers are more productive because of this coding assistance, they will be able to design, test, and deploy new AI systems more quickly. But the productivity advantage that these tools confer remains uncertain: If engineers are spending large amounts of time correcting errors made by AI systems, they might not be getting any more work done, even if they are spending less of their time writing code manually. A recent study from METR found that developers take about 20% longer to complete tasks when using AI coding assistants, though Nate Rush, a member of METR’s technical staff who co-led the study, notes that it only examined extremely experienced developers working on large code bases. Its conclusions might not apply to AI researchers who write up quick scripts to run experiments. Conducting a similar study within the frontier labs could help provide a much clearer picture of whether coding assistants are making AI researchers at the cutting edge more productive, Rush says—but that work hasn’t yet been undertaken. In the meantime, just taking software engineers’ word for it isn’t enough: The developers METR studied thought that the AI coding tools had made them work more efficiently, even though the tools had actually slowed them down substantially. 2. Optimizing infrastructure Writing code quickly isn’t that much of an advantage if you have to wait hours, days, or weeks for it to run. LLM training, in particular, is an agonizingly slow process, and the most sophisticated reasoning models can take many minutes to generate a single response. These delays are major bottlenecks for AI development, says Azalia Mirhoseini, an assistant professor of computer science at Stanford University and senior staff scientist at Google DeepMind. “If we can run AI faster, we can innovate more,” she says. That’s why Mirhoseini has been using AI to optimize AI chips. Back in 2021, she and her collaborators at Google built a non-LLM AI system that could decide where to place various components on a computer chip to optimize efficiency. Although some other researchers failed to replicate the study’s results, Mirhoseini says that Nature investigated the paper and upheld the work’s validity—and she notes that Google has used the system’s designs for multiple generations of its custom AI chips. More recently, Mirhoseini has applied LLMs to the problem of writing kernels, low-level functions that control how various operations, like matrix multiplication, are carried out in chips. She’s found that even general-purpose LLMs can, in some cases, write kernels that run faster than the human-designed versions. Elsewhere at Google, scientists built a system that they used to optimize various parts of the company’s LLM infrastructure. The system, called AlphaEvolve, prompts Google’s Gemini LLM to write algorithms for solving some problem, evaluates those algorithms, and asks Gemini to improve on the most successful—and repeats that process several times. AlphaEvolve designed a new approach for running datacenters that saved 0.7% of Google’s computational resources, made further improvements to Google’s custom chip design, and designed a new kernel that sped up Gemini’s training by 1%.    That might sound like a small improvement, but at a huge company like Google it equates to enormous savings of time, money, and energy. And Matej Balog, a staff research scientist at Google DeepMind who led the AlphaEvolve project, says that he and his team tested the system on only a small component of Gemini’s overall training pipeline. Applying it more broadly, he says, could lead to more savings. 3. Automating training LLMs are famously data hungry, and training them is costly at every stage. In some specific domains—unusual programming languages, for example—real-world data is too scarce to train LLMs effectively. Reinforcement learning with human feedback, a technique in which humans score LLM responses to prompts and the LLMs are then trained using those scores, has been key to creating models that behave in line with human standards and preferences, but obtaining human feedback is slow and expensive.  Increasingly, LLMs are being used to fill in the gaps. If prompted with plenty of examples, LLMs can generate plausible synthetic data in domains in which they haven’t been trained, and that synthetic data can then be used for training. LLMs can also be used effectively for reinforcement learning: In an approach called “LLM as a judge,” LLMs, rather than humans, are used to score the outputs of models that are being trained. That approach is key to the influential “Constitutional AI” framework proposed by Anthropic researchers in 2022, in which one LLM is trained to be less harmful based on feedback from another LLM. Data scarcity is a particularly acute problem for AI agents. Effective agents need to be able to carry out multistep plans to accomplish particular tasks, but examples of successful step-by-step task completion are scarce online, and using humans to generate new examples would be pricey. To overcome this limitation, Stanford’s Mirhoseini and her colleagues have recently piloted a technique in which an LLM agent generates a possible step-by-step approach to a given problem, an LLM judge evaluates whether each step is valid, and then a new LLM agent is trained on those steps. “You’re not limited by data anymore, because the model can just arbitrarily generate more and more experiences,” Mirhoseini says. 4. Perfecting agent design One area where LLMs haven’t yet made major contributions is in the design of LLMs themselves. Today’s LLMs are all based on a neural-network structure called a transformer, which was proposed by human researchers in 2017, and the notable improvements that have since been made to the architecture were also human-designed.  But the rise of LLM agents has created an entirely new design universe to explore. Agents need tools to interact with the outside world and instructions for how to use them, and optimizing those tools and instructions is essential to producing effective agents. “Humans haven’t spent as much time mapping out all these ideas, so there’s a lot more low-hanging fruit,” Clune says. “It’s easier to just create an AI system to go pick it.” Together with researchers at the startup Sakana AI, Clune created a system called a “Darwin Gödel Machine”: an LLM agent that can iteratively modify its prompts, tools, and other aspects of its code to improve its own task performance. Not only did the Darwin Gödel Machine achieve higher task scores through modifying itself, but as it evolved, it also managed to find new modifications that its original version wouldn’t have been able to discover. It had entered a true self-improvement loop. 5. Advancing research Although LLMs are speeding up numerous parts of the LLM development pipeline, humans may still remain essential to AI research for quite a while. Many experts point to “research taste,” or the ability that the best scientists have to pick out promising new research questions and directions, as both a particular challenge for AI and a key ingredient in AI development.  But Clune says research taste might not be as much of a challenge for AI as some researchers think. He and Sakana AI researchers are working on an end-to-end system for AI research that they call the “AI Scientist.” It searches through the scientific literature to determine its own research question, runs experiments to answer that question, and then writes up its results. One paper that it wrote earlier this year, in which it devised and tested a new training strategy aimed at making neural networks better at combining examples from their training data, was anonymously submitted to a workshop at the International Conference on Machine Learning, or ICML—one of the most prestigious conferences in the field—with the consent of the workshop organizers. The training strategy didn’t end up working, but the paper was scored highly enough by reviewers to qualify it for acceptance (it is worth noting that ICML workshops have lower standards for acceptance than the main conference). In another instance, Clune says, the AI Scientist came up with a research idea that was later independently proposed by a human researcher on X, where it attracted plenty of interest from other scientists. “We are looking right now at the GPT-1 moment of the AI Scientist,” Clune says. “In a few short years, it is going to be writing papers that will be accepted at the top peer-reviewed conferences and journals in the world. It will be making novel scientific discoveries.” Is superintelligence on its way? With all this enthusiasm for AI self-improvement, it seems likely that in the coming months and years, the contributions AI makes to its own development will only multiply. To hear Mark Zuckerberg tell it, this could mean that superintelligent models, which exceed human capabilities in many domains, are just around the corner. In reality, though, the impact of self-improving AI is far from certain. It’s notable that AlphaEvolve has sped up the training of its own core LLM system, Gemini—but that 1% speedup may not observably change the pace of Google’s AI advancements. “This is still a feedback loop that’s very slow,” says Balog, the AlphaEvolve researcher. “The training of Gemini takes a significant amount of time. So you can maybe see the exciting beginnings of this virtuous [cycle], but it’s still a very slow process.” If each subsequent version of Gemini speeds up its own training by an additional 1%, those accelerations will compound. And because each successive generation will be more capable than the previous one, it should be able to achieve even greater training speedups—not to mention all the other ways it might devise to improve itself. Under such circumstances, proponents of superintelligence argue, an eventual intelligence explosion looks inevitable. This conclusion, however, ignores a key observation: Innovation gets harder over time. In the early days of any scientific field, discoveries come fast and easy. There are plenty of obvious experiments to run and ideas to investigate, and none of them have been tried before. But as the science of deep learning matures, finding each additional improvement might require substantially more effort on the part of both humans and their AI collaborators. It’s possible that by the time AI systems attain human-level research abilities, humans or less-intelligent AI systems will already have plucked all the low-hanging fruit. Determining the real-world impact of AI self-improvement, then, is a mighty challenge. To make matters worse, the AI systems that matter most for AI development—those being used inside frontier AI companies—are likely more advanced than those that have been released to the general public, so measuring o3’s capabilities might not be a great way to infer what’s happening inside OpenAI. But external researchers are doing their best—by, for example, tracking the overall pace of AI development to determine whether or not that pace is accelerating. METR is monitoring advancements in AI abilities by measuring how long it takes humans to do tasks that cutting-edge systems can complete themselves. They’ve found that the length of tasks that AI systems can complete independently has, since the release of GPT-2 in 2019, doubled every seven months.  Since 2024, that doubling time has shortened to four months, which suggests that AI progress is indeed accelerating. There may be unglamorous reasons for that: Frontier AI labs are flush with investor cash, which they can spend on hiring new researchers and purchasing new hardware. But it’s entirely plausible that AI self-improvement could also be playing a role. That’s just one indirect piece of evidence. But Davidson, the Forethought researcher, says there’s good reason to expect that AI will supercharge its own advancement, at least for a time. METR’s work suggests that the low-hanging-fruit effect isn’t slowing down human researchers today, or at least that increased investment is effectively counterbalancing any slowdown. If AI notably increases the productivity of those researchers, or even takes on some fraction of the research work itself, that balance will shift in favor of research acceleration. “You would, I think, strongly expect that there’ll be a period when AI progress speeds up,” Davidson says. “The big question is how long it goes on for.”

Last week, Mark Zuckerberg declared that Meta is aiming to achieve smarter-than-human AI. He seems to have a recipe for achieving that goal, and the first ingredient is human talent: Zuckerberg has reportedly tried to lure top researchers to Meta Superintelligence Labs with nine-figure offers. The second ingredient is AI itself.  Zuckerberg recently said on an earnings call that Meta Superintelligence Labs will be focused on building self-improving AI—systems that can bootstrap themselves to higher and higher levels of performance.

The possibility of self-improvement distinguishes AI from other revolutionary technologies. CRISPR can’t improve its own targeting of DNA sequences, and fusion reactors can’t figure out how to make the technology commercially viable. But LLMs can optimize the computer chips they run on, train other LLMs cheaply and efficiently, and perhaps even come up with original ideas for AI research. And they’ve already made some progress in all these domains.

According to Zuckerberg, AI self-improvement could bring about a world in which humans are liberated from workaday drudgery and can pursue their highest goals with the support of brilliant, hypereffective artificial companions. But self-improvement also creates a fundamental risk, according to Chris Painter, the policy director at the AI research nonprofit METR. If AI accelerates the development of its own capabilities, he says, it could rapidly get better at hacking, designing weapons, and manipulating people. Some researchers even speculate that this positive feedback cycle could lead to an “intelligence explosion,” in which AI rapidly launches itself far beyond the level of human capabilities.

But you don’t have to be a doomer to take the implications of self-improving AI seriously. OpenAI, Anthropic, and Google all include references to automated AI research in their AI safety frameworks, alongside more familiar risk categories such as chemical weapons and cybersecurity. “I think this is the fastest path to powerful AI,” says Jeff Clune, a professor of computer science at the University of British Columbia and senior research advisor at Google DeepMind. “It’s probably the most important thing we should be thinking about.”

By the same token, Clune says, automating AI research and development could have enormous upsides. On our own, we humans might not be able to think up the innovations and improvements that will allow AI to one day tackle prodigious problems like cancer and climate change.

For now, human ingenuity is still the primary engine of AI advancement; otherwise, Meta would hardly have made such exorbitant offers to attract researchers to its superintelligence lab. But AI is already contributing to its own development, and it’s set to take even more of a role in the years to come. Here are five ways that AI is making itself better.

1. Enhancing productivity

Today, the most important contribution that LLMs make to AI development may also be the most banal. “The biggest thing is coding assistance,” says Tom Davidson, a senior research fellow at Forethought, an AI research nonprofit. Tools that help engineers write software more quickly, such as Claude Code and Cursor, appear popular across the AI industry: Google CEO Sundar Pichai claimed in October 2024 that a quarter of the company’s new code was generated by AI, and Anthropic recently documented a wide variety of ways that its employees use Claude Code. If engineers are more productive because of this coding assistance, they will be able to design, test, and deploy new AI systems more quickly.

But the productivity advantage that these tools confer remains uncertain: If engineers are spending large amounts of time correcting errors made by AI systems, they might not be getting any more work done, even if they are spending less of their time writing code manually. A recent study from METR found that developers take about 20% longer to complete tasks when using AI coding assistants, though Nate Rush, a member of METR’s technical staff who co-led the study, notes that it only examined extremely experienced developers working on large code bases. Its conclusions might not apply to AI researchers who write up quick scripts to run experiments.

Conducting a similar study within the frontier labs could help provide a much clearer picture of whether coding assistants are making AI researchers at the cutting edge more productive, Rush says—but that work hasn’t yet been undertaken. In the meantime, just taking software engineers’ word for it isn’t enough: The developers METR studied thought that the AI coding tools had made them work more efficiently, even though the tools had actually slowed them down substantially.

2. Optimizing infrastructure

Writing code quickly isn’t that much of an advantage if you have to wait hours, days, or weeks for it to run. LLM training, in particular, is an agonizingly slow process, and the most sophisticated reasoning models can take many minutes to generate a single response. These delays are major bottlenecks for AI development, says Azalia Mirhoseini, an assistant professor of computer science at Stanford University and senior staff scientist at Google DeepMind. “If we can run AI faster, we can innovate more,” she says.

That’s why Mirhoseini has been using AI to optimize AI chips. Back in 2021, she and her collaborators at Google built a non-LLM AI system that could decide where to place various components on a computer chip to optimize efficiency. Although some other researchers failed to replicate the study’s results, Mirhoseini says that Nature investigated the paper and upheld the work’s validity—and she notes that Google has used the system’s designs for multiple generations of its custom AI chips.

More recently, Mirhoseini has applied LLMs to the problem of writing kernels, low-level functions that control how various operations, like matrix multiplication, are carried out in chips. She’s found that even general-purpose LLMs can, in some cases, write kernels that run faster than the human-designed versions.

Elsewhere at Google, scientists built a system that they used to optimize various parts of the company’s LLM infrastructure. The system, called AlphaEvolve, prompts Google’s Gemini LLM to write algorithms for solving some problem, evaluates those algorithms, and asks Gemini to improve on the most successful—and repeats that process several times. AlphaEvolve designed a new approach for running datacenters that saved 0.7% of Google’s computational resources, made further improvements to Google’s custom chip design, and designed a new kernel that sped up Gemini’s training by 1%.   

That might sound like a small improvement, but at a huge company like Google it equates to enormous savings of time, money, and energy. And Matej Balog, a staff research scientist at Google DeepMind who led the AlphaEvolve project, says that he and his team tested the system on only a small component of Gemini’s overall training pipeline. Applying it more broadly, he says, could lead to more savings.

3. Automating training

LLMs are famously data hungry, and training them is costly at every stage. In some specific domains—unusual programming languages, for example—real-world data is too scarce to train LLMs effectively. Reinforcement learning with human feedback, a technique in which humans score LLM responses to prompts and the LLMs are then trained using those scores, has been key to creating models that behave in line with human standards and preferences, but obtaining human feedback is slow and expensive. 

Increasingly, LLMs are being used to fill in the gaps. If prompted with plenty of examples, LLMs can generate plausible synthetic data in domains in which they haven’t been trained, and that synthetic data can then be used for training. LLMs can also be used effectively for reinforcement learning: In an approach called “LLM as a judge,” LLMs, rather than humans, are used to score the outputs of models that are being trained. That approach is key to the influential “Constitutional AI” framework proposed by Anthropic researchers in 2022, in which one LLM is trained to be less harmful based on feedback from another LLM.

Data scarcity is a particularly acute problem for AI agents. Effective agents need to be able to carry out multistep plans to accomplish particular tasks, but examples of successful step-by-step task completion are scarce online, and using humans to generate new examples would be pricey. To overcome this limitation, Stanford’s Mirhoseini and her colleagues have recently piloted a technique in which an LLM agent generates a possible step-by-step approach to a given problem, an LLM judge evaluates whether each step is valid, and then a new LLM agent is trained on those steps. “You’re not limited by data anymore, because the model can just arbitrarily generate more and more experiences,” Mirhoseini says.

4. Perfecting agent design

One area where LLMs haven’t yet made major contributions is in the design of LLMs themselves. Today’s LLMs are all based on a neural-network structure called a transformer, which was proposed by human researchers in 2017, and the notable improvements that have since been made to the architecture were also human-designed. 

But the rise of LLM agents has created an entirely new design universe to explore. Agents need tools to interact with the outside world and instructions for how to use them, and optimizing those tools and instructions is essential to producing effective agents. “Humans haven’t spent as much time mapping out all these ideas, so there’s a lot more low-hanging fruit,” Clune says. “It’s easier to just create an AI system to go pick it.”

Together with researchers at the startup Sakana AI, Clune created a system called a “Darwin Gödel Machine”: an LLM agent that can iteratively modify its prompts, tools, and other aspects of its code to improve its own task performance. Not only did the Darwin Gödel Machine achieve higher task scores through modifying itself, but as it evolved, it also managed to find new modifications that its original version wouldn’t have been able to discover. It had entered a true self-improvement loop.

5. Advancing research

Although LLMs are speeding up numerous parts of the LLM development pipeline, humans may still remain essential to AI research for quite a while. Many experts point to “research taste,” or the ability that the best scientists have to pick out promising new research questions and directions, as both a particular challenge for AI and a key ingredient in AI development. 

But Clune says research taste might not be as much of a challenge for AI as some researchers think. He and Sakana AI researchers are working on an end-to-end system for AI research that they call the “AI Scientist.” It searches through the scientific literature to determine its own research question, runs experiments to answer that question, and then writes up its results.

One paper that it wrote earlier this year, in which it devised and tested a new training strategy aimed at making neural networks better at combining examples from their training data, was anonymously submitted to a workshop at the International Conference on Machine Learning, or ICML—one of the most prestigious conferences in the field—with the consent of the workshop organizers. The training strategy didn’t end up working, but the paper was scored highly enough by reviewers to qualify it for acceptance (it is worth noting that ICML workshops have lower standards for acceptance than the main conference). In another instance, Clune says, the AI Scientist came up with a research idea that was later independently proposed by a human researcher on X, where it attracted plenty of interest from other scientists.

“We are looking right now at the GPT-1 moment of the AI Scientist,” Clune says. “In a few short years, it is going to be writing papers that will be accepted at the top peer-reviewed conferences and journals in the world. It will be making novel scientific discoveries.”

Is superintelligence on its way?

With all this enthusiasm for AI self-improvement, it seems likely that in the coming months and years, the contributions AI makes to its own development will only multiply. To hear Mark Zuckerberg tell it, this could mean that superintelligent models, which exceed human capabilities in many domains, are just around the corner. In reality, though, the impact of self-improving AI is far from certain.

It’s notable that AlphaEvolve has sped up the training of its own core LLM system, Gemini—but that 1% speedup may not observably change the pace of Google’s AI advancements. “This is still a feedback loop that’s very slow,” says Balog, the AlphaEvolve researcher. “The training of Gemini takes a significant amount of time. So you can maybe see the exciting beginnings of this virtuous [cycle], but it’s still a very slow process.”

If each subsequent version of Gemini speeds up its own training by an additional 1%, those accelerations will compound. And because each successive generation will be more capable than the previous one, it should be able to achieve even greater training speedups—not to mention all the other ways it might devise to improve itself. Under such circumstances, proponents of superintelligence argue, an eventual intelligence explosion looks inevitable.

This conclusion, however, ignores a key observation: Innovation gets harder over time. In the early days of any scientific field, discoveries come fast and easy. There are plenty of obvious experiments to run and ideas to investigate, and none of them have been tried before. But as the science of deep learning matures, finding each additional improvement might require substantially more effort on the part of both humans and their AI collaborators. It’s possible that by the time AI systems attain human-level research abilities, humans or less-intelligent AI systems will already have plucked all the low-hanging fruit.

Determining the real-world impact of AI self-improvement, then, is a mighty challenge. To make matters worse, the AI systems that matter most for AI development—those being used inside frontier AI companies—are likely more advanced than those that have been released to the general public, so measuring o3’s capabilities might not be a great way to infer what’s happening inside OpenAI.

But external researchers are doing their best—by, for example, tracking the overall pace of AI development to determine whether or not that pace is accelerating. METR is monitoring advancements in AI abilities by measuring how long it takes humans to do tasks that cutting-edge systems can complete themselves. They’ve found that the length of tasks that AI systems can complete independently has, since the release of GPT-2 in 2019, doubled every seven months. 

Since 2024, that doubling time has shortened to four months, which suggests that AI progress is indeed accelerating. There may be unglamorous reasons for that: Frontier AI labs are flush with investor cash, which they can spend on hiring new researchers and purchasing new hardware. But it’s entirely plausible that AI self-improvement could also be playing a role.

That’s just one indirect piece of evidence. But Davidson, the Forethought researcher, says there’s good reason to expect that AI will supercharge its own advancement, at least for a time. METR’s work suggests that the low-hanging-fruit effect isn’t slowing down human researchers today, or at least that increased investment is effectively counterbalancing any slowdown. If AI notably increases the productivity of those researchers, or even takes on some fraction of the research work itself, that balance will shift in favor of research acceleration.

“You would, I think, strongly expect that there’ll be a period when AI progress speeds up,” Davidson says. “The big question is how long it goes on for.”

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Mapping Trump’s tariffs by trade balance and geography

U.S. importers may soon see costs rise for many imported goods, as tariffs on foreign goods are set to rise. On July 31, President Donald Trump announced country-specific reciprocal tariffs would finally be implemented on Aug. 7, after a monthslong pause. The news means more than 90 countries will see

Read More »

JF Expands in Southwest with Maverick Acquisition

The JF Group (JF) has acquired Arizona-based Maverick Petroleum Services. JF, a fueling infrastructure, petroleum equipment distribution, service, general contracting, and construction services provider, said in a media release that Maverick brings expertise in the installation, maintenance, and repair of petroleum handling equipment, Point-of-Sale (POS) systems, and environmental testing. As

Read More »

Glencore Shakes Up Trading Team

The head of Glencore Plc’s huge coal-trading operation is leaving in the biggest shake-up of the trading unit in years, at a time when the commodity giant is struggling to revive its share price.  The company, which traces its roots to the legendary commodity trader Marc Rich, has been reviewing its mining and smelting assets and recently unveiled a $1 billion cost cutting target. On Wednesday, it disappointed investors with weak results for the first half of the year that included one of the worst performances from its energy- and coal-trading unit on record. Glencore is reshuffling its trading team as Ruan van Schalkwyk, 42, a longstanding executive who runs coal trading, is retiring, according to a memo from Chief Executive Officer Gary Nagle that was seen by Bloomberg News. Glencore is the world’s largest shipper of coal.  Jyothish George, currently head of copper and cobalt trading, will take on a wider role as head of metals, iron ore and coal trading, according to the memo. Several trading executives who currently report to Nagle will now report to him, including Peter Hill, the head of iron ore, and Robin Scheiner, head of alumina and aluminum.  Alex Sanna, who runs oil, gas and power trading, will continue to report to Nagle. Under George, trading responsibilities are being reassigned. David Thomas, currently head of ferroalloys trading, will take over thermal coal. Paymahn Seyed-Safi will have responsibility for chrome as well as nickel; and Hill will take over responsibility for metallurgical coal, vanadium and manganese as well as iron ore. The changes come after Glencore’s trading teams reported starkly different results for the first half of the year. The company’s metals traders notched up their best half-yearly performance on record, while their energy and coal-trading peers struggled to even turn a profit.  Van Schalkwyk ran ferroalloys trading

Read More »

AI could cut disaster infrastructure losses by 15%, new research finds

Dive Brief: AI applications such as predictive maintenance and digital twins could prevent 15% of projected natural disaster losses to power grids, water systems and transportation infrastructure, amounting to $70 billion in savings worldwide by 2050, according to a recently released Deloitte Center for Sustainable Progress report. Governments and other stakeholders need to overcome technological limitations, financial constraints, regulatory uncertainty, data availability and security concerns before AI-enabled resilience can be widely adopted for infrastructure systems, according to the report. “Investing in AI can help deliver less frequent or shorter power outages, faster system recovery after storms, or fewer damaged or non-usable roads and bridges,” Jennifer Steinmann, Deloitte Global Sustainability Business leader, said in an email. Dive Insight: Natural disasters have caused nearly $200 billion in average annual losses to infrastructure around the world over the past 15 years, according to Deloitte. The report projects that could increase to approximately $460 billion by 2050. Climate change is expected to increase the frequency and intensity of these events, leading to higher losses, according to the report.   “Investing in AI has the greatest near-term potential to help reduce damages from storms, which include tropical cyclones, tornados, thunderstorms, hailstorms, and blizzards,” Steinmann said. “These natural disasters drive the largest share of infrastructure losses, due to their high frequency, wide geographic reach, and increasing intensity.” The AI for Infrastructure Resilience report uses empirical case studies, probabilistic risk modeling and economic forecasting to show how AI can help leaders fortify infrastructure so they can plan, respond and recover more quickly from natural disasters. “AI technologies can offer preventative, detective and responsive solutions to help address natural disasters — but some interventions are more impactful than others,” Steinmann said. Investing in AI while infrastructure is in planning stages accounts for roughly two-thirds of AI’s potential to prevent

Read More »

Lawmaker, AARP call for nationwide utility commission reforms to stop rising electric bills

Dive Brief: Utility commissions across the nation are “broken” and must be reformed to stop rising electric costs, a coalition of lawmakers, consumer and environmental advocates said Tuesday during a joint press conference. Speakers at the press conference said it seemed likely that the Florida Public Service Commission would “rubber stamp” a $9.8 billion base rate increase proposed by Florida Power & Light, and argued that regulators have become too deferential to utility requests. U.S. Rep. Kathy Castor, D-Fla., reintroduced legislation on Tuesday that would prohibit utility companies from using ratepayer dollars to fund political lobbying and advertising. Dive Insight: What started as a Tuesday morning press conference drawing attention to the plight of Floridians struggling to pay rising electric bills quickly escalated to calls for legislative reforms of utility commissions across the nation. “When we have a public service commission that does not look out for the best interest of the customers but rather the utility company itself, there is a problem,” Zayne Smith, senior director of advocacy at AARP Florida, said. “This is a canary in the coal mine if the current ask is granted.” Advocates on the call argued that FPL’s request to increase base rates by 2.5% would harm Florida residents who are already struggling to pay their bills amid other rising costs. Hearings in the case are set to begin this month. Documents obtained during the rate case discovery period suggest that up to a fifth of FPL customers had their power shut off between March 2024 and February 2025 due to unpaid bills, according to Bradley Marshall, an Earthjustice attorney who is representing Florida Rising, the League of United Latin American Citizens and the Environmental Confederation of Southwest Florida in the upcoming rate case proceedings. While FPL argues that the increase would maintain base rates

Read More »

Crude Steadies After Volatile Session

Oil closed unchanged after a choppy session as investors assessed whether a prospective deal by the US and Russia to halt the war in Ukraine would receive international support and materially affect Russian crude flows. West Texas Intermediate swung in a roughly $1.80 range before ending the day flat below $64 a barrel, narrowly breaking a six-session losing streak. The US and Russia are aiming to reach a deal that would lock in Russia’s occupation of territory seized during its invasion, according to people familiar with the matter. Washington is working to get buy-in from Ukraine and its European allies on the agreement, which is far from certain. The US and the European Union have targeted Russia’s oil revenues in response to its invasion of Ukraine, with President Donald Trump just this week doubling levies on all Indian imports to 50% as a penalty for the nation taking Russian crude and threatening similar measures against China. Though investors remain skeptical that Europe would support a deal representing a major victory for Russian President Vladimir Putin, the renewed collaboration between Washington and Moscow has lifted expectations that the nation’s crude will continue to flow freely to its two biggest buyers. Still, the market’s focus has shifted to whether US sanctions on Russia — which have crimped Russia’s ability to sell oil and replenish the Kremlin’s war chest in recent months — will remain in place. “A possible truce would be only modestly bearish crude — assuming there is no lifting of EU and US sanctions against Russian energy — since the market does not currently price in much disruption risk,” said Bob McNally, founder of the Rapidan Energy Group and a former White House official. The proposed deal resembles a ceasefire, not a full-fledged peace agreement, he added. At the same

Read More »

SLB, AIQ Join Forces to Boost ADNOC’s Energy Efficiency with Agentic AI

Schlumberger N.V., the energy tech company doing business as SLB, will team up with AIQ, the Abu Dhabi-based AI specialist for the energy sector. SLB said in a media release that the two companies will collaborate to advance AIQ’s development and deployment of its ENERGYai agentic AI solution across ADNOC’s subsurface operations. Built on 70 years of proprietary data and expertise, ENERGYai integrates large language model (LLM) technology with advanced agentic AI, SLB said. This AI is tailored for specific workflows across ADNOC’s upstream value chain. Initial tests using 15 percent of ADNOC’s data, focusing on two fields, showed a seismic agent that boosted seismic interpretation speed by 10 times and improved accuracy by 70 percent, it said. In partnership, SLB and AIQ will design and deploy new agentic AI workflows across ADNOC’s subsurface operations, including geology, seismic exploration, and reservoir modelling. SLB will provide support with its Lumi data and AI platform, and other digital technologies. A scalable version of ENERGYai is under development, which will include AI agents covering tasks within subsurface operations. Deployment will commence in the fourth quarter of 2025, SLB said. “This partnership reflects our vision to harness AI for energy optimization, and we are enthusiastic that SLB shares this outlook. The collaboration between AIQ and SLB enables the development of sophisticated AI workflows that integrate seamlessly with ADNOC’s infrastructure, driving efficiency, scalability, and innovation at every stage of the energy lifecycle”, Dennis Jol, CEO of AIQ, said. “Our ENERGYai agentic AI solution is pioneering in its sheer scale and impact, and we are proud to involve other significant industry technology players in its development and evolution”. ENERGYai will power agentic AI to automate complex, high-impact tasks, increasing efficiency, enhancing decision-making and optimizing production across ADNOC’s operations, SLB said. The partnership between AIQ and SLB demonstrates a mutual

Read More »

Diamondback Energy Narrows Production Guidance as Net Income Dips in Q2

Diamondback Energy, Inc. reported a net income of $699 million for the second quarter of 2025, well below the $837 million reported in the corresponding quarter of 2024. However, the first half net income of $2.1 billion surged past the $1.6 billion reported in H1 2024. The company said in its report that production for the quarter averaged 919,000 barrels of oil equivalent per day (boe/d). Oil production averaged 495,700 barrels per day (mbo/d). Diamondback said it put 108 wells into production in the Midland basin, and a further eight wells into production in the Delaware Basin. During the first half of the year, Diamondback said that 224 operated wells entered production in the Midland Basin with 15 more wells entering production in the Delaware Basin. In the second quarter of 2025, Diamondback said it had invested $707 million in operated drilling and completions, $90 million in capital workovers, non-operated drilling, completions, and science, and $67 million in infrastructure, environmental, and midstream projects, totaling $864 million in cash capital expenditures. For the first half of 2025, the company spent $1.6 billion on operated drilling and completions, $111 million on capital workovers, non-operated drilling, completions, and science, and $124 million on infrastructure, environmental, and midstream activities, amounting to a total of $1.8 billion in cash capital expenditures, it said. Diamondback has also narrowed its full-year oil production guidance to 485 – 492 mbo/d and increased annual boe guidance by 2 percent to 890 – 910 Mboe/d, it said. Furthermore,  Diamondback noted that the guidance does not reflect the pending acquisition by its publicly traded subsidiary, Viper Energy, Inc., of Sitio Royalties Corp., which is expected to close in the third quarter of 2025, subject to stockholder approval and the fulfillment or waiver of other typical closing conditions. To contact the author,

Read More »

Stargate’s slow start reveals the real bottlenecks in scaling AI infrastructure

The CFO emphasized that SoftBank remains committed to its original target of $346 billion (JPY 500 billion) over 4 years for the Stargate project, noting that major sites have been selected in the US and preparations are taking place simultaneously across multiple fronts. Requests for comment to Stargate partners Nvidia, OpenAI, and Oracle remain unanswered. Infrastructure reality check for CIOs These challenges offer important lessons for enterprise IT leaders facing similar AI infrastructure decisions. Sanchit Vir Gogia, chief analyst and CEO at Greyhound Research, said that Goto’s confirmation of delays “reflects a challenge CIOs see repeatedly” in partner onboarding delays, service activation slips, and revised delivery commitments from cloud and datacenter providers. Oishi Mazumder, senior analyst at Everest Group, noted that “SoftBank’s Stargate delays show that AI infrastructure is not constrained by compute or capital, but by land, energy, and stakeholder alignment.” The analyst emphasized that CIOs must treat AI infrastructure “as a cross-functional transformation, not an IT upgrade, demanding long-term, ecosystem-wide planning.” “Scaling AI infrastructure depends less on the technical readiness of servers or GPUs and more on the orchestration of distributed stakeholders — utilities, regulators, construction partners, hardware suppliers, and service providers — each with their own cadence and constraints,” Gogia said.

Read More »

Incentivizing the Digital Future: Inside America’s Race to Attract Data Centers

Across the United States, states are rolling out a wave of new tax incentives aimed squarely at attracting data centers, one of the country’s fastest-growing industries. Once clustered in only a handful of industry-friendly regions, today’s data-center boom is rapidly spreading, pushed along by profound shifts in federal policy, surging demand for artificial intelligence, and the drive toward digital transformation across every sector of the economy. Nowhere is this transformation more visible than in the intensifying state-by-state competition to land massive infrastructure investments, advanced technology jobs, and the alluring prospect of long-term economic growth. The past year alone has seen a record number of states introducing or expanding incentives for data centers, from tax credits to expedited permitting, reflecting a new era of proactive, tech-focused economic development policy. Behind these moves, federal initiatives and funding packages underscore the essential role of digital infrastructure as a national priority, encouraging states to lower barriers for data center construction and operation. As states watch their neighbors reap direct investment and job creation benefits, a real “domino effect” emerges: one state’s success becomes another’s blueprint, heightening the pressure and urgency to compete. Yet, this wave of incentives also exposes deeper questions about the local impact, community costs, and the evolving relationship between public policy and the tech industry. From federal levels to town halls, there are notable shifts in both opportunities and challenges shaping the landscape of digital infrastructure advancement. Industry Drivers: the Federal Push and Growth of AI The past year has witnessed a profound federal policy shift aimed squarely at accelerating U.S. digital infrastructure, especially for data centers in direct response both to the explosive growth of artificial intelligence and to intensifying international competition. In July 2025, the administration unveiled “America’s AI Action Plan,” accompanied by multiple executive orders that collectively redefined

Read More »

AI Supercharges Hyperscale: Capacity, Geography, and Design Are Being Redrawn

From Cloud to GenAI, Hyperscalers Cement Role as Backbone of Global Infrastructure Data center capacity is undergoing a major shift toward hyperscale operators, which now control 44 percent of global capacity, according to Synergy Research Group. Non-hyperscale colocations account for another 22 percent of capacity and is expected to continue, but hyperscalers projected to hold 61 percent of the capacity by 2030. That swing also reflects the dominance of hyperscalers geographically. In a separate Synergy study revealing the world’s top 20 hyperscale data center locations, just 20 U.S. state or metro markets account for 62 percent of the world’s hyperscale capacity.  Northern Virginia and the Greater Beijing areas alone make up 20 percent of the total. They’re followed by the U.S. states of Oregon and Iowa, Dublin, the U.S. state of Ohio, Dallas, and then Shanghai. Of the top 20 markets, 14 are in the U.S., five in APAC region, and only one is in Europe. This rapid shift is fueled by the explosive growth of cloud computing, artificial intelligence (AI), and especially generative AI (GenAI)—power-intensive technologies that demand the scale, efficiency, and specialized infrastructure only hyperscalers can deliver. What’s Coming for Capacity The capacity research shows on-premises data centers with 34 percent of the total capacity, a significant drop from the 56 percent capacity they accounted for just six years ago.  Synergy projects that by 2030, hyperscale operators such as Google Cloud, Amazon Web Services, and Microsoft Azure will claim 61 percent of all capacity, while on-premises share will drop to just 22 percent. So, it appears on-premises data centers are both increasing and decreasing. That’s one way to put it, but it’s about perspective. Synergy’s capacity study indicates they’re growing as the volume of enterprise GPU servers increases. The shrinkage refers to share of the market: Hyperscalers are growing

Read More »

In crowded observability market, Gartner calls out AI capabilities, cost optimization, DevOps integration

Support for OpenTelemetry and open standards is another differentiator for Gartner. Vendors that embrace these frameworks are better positioned to offer extensibility, avoid vendor lock-in, and enable broader ecosystem integration. This openness is paired with a growing focus on cost optimization—an increasingly important concern as telemetry data volumes increase. Leaders offer granular data retention controls, tiered storage, and usage-based pricing models to help customers Gartner also highlights the importance of the developer experience and DevOps integration. Observability leaders provide “integration with other operations, service management, and software development technologies, such as IT service management (ITSM), configuration management databases (CMDB), event and incident response management, orchestration and automation, and DevOps tools.” On the automation front, observability platforms should support initiating changes to application and infrastructure code to optimize cost, capacity or performance—or to take corrective action to mitigate failures, Gartner says. Leaders must also include application security functionality to identify known vulnerabilities and block attempts to exploit them. Gartner identifies observability leaders This year’s report highlights eight vendors in the leaders category, all of which have demonstrated strong product capabilities, solid technology execution, and innovative strategic vision. Read on to learn what Gartner thinks makes these eight vendors (listed in alphabetical order) stand out as leaders in observability: Chronosphere: Strengths include cost optimization capabilities with its control plane that closely manages the ingestion, storage, and retention of incoming telemetry using granular policy controls. The platform requires no agents and relies largely on open protocols such as OpenTelemetry and Prometheus. Gartner cautions that Chronosphere has not emphasized AI capabilities in its observability platform and currently offers digital experience monitoring via partnerships. Datadog: Strengths include extensive capabilities for managing service-level objectives across data types and providing deep visibility into system and application behavior without the need for instrumentation. Gartner notes the vendor’s licensing

Read More »

LiquidStack CEO Joe Capes on GigaModular, Direct-to-Chip Cooling, and AI’s Thermal Future

In this episode of the Data Center Frontier Show, Editor-in-Chief Matt Vincent speaks with LiquidStack CEO Joe Capes about the company’s breakthrough GigaModular platform — the industry’s first scalable, modular Coolant Distribution Unit (CDU) purpose-built for direct-to-chip liquid cooling. With rack densities accelerating beyond 120 kW and headed toward 600 kW, LiquidStack is targeting the real-world requirements of AI data centers while streamlining complexity and future-proofing thermal design. “AI will keep pushing thermal output to new extremes,” Capes tells DCF. “Data centers need cooling systems that can be easily deployed, managed, and scaled to match heat rejection demands as they rise.” LiquidStack’s new GigaModular CDU, unveiled at the 2025 Datacloud Global Congress in Cannes, delivers up to 10 MW of scalable cooling capacity. It’s designed to support single-phase direct-to-chip liquid cooling — a shift from the company’s earlier two-phase immersion roots — via a skidded modular design with a pay-as-you-grow approach. The platform’s flexibility enables deployments at N, N+1, or N+2 resiliency. “We designed it to be the only CDU our customers will ever need,” Capes says. From Immersion to Direct-to-Chip LiquidStack first built its reputation on two-phase immersion cooling, which Joe Capes describes as “the highest performing, most sustainable cooling technology on Earth.” But with the launch of GigaModular, the company is now expanding into high-density, direct-to-chip cooling, helping hyperscale and colocation providers upgrade their thermal strategies without overhauling entire facilities. “What we’re trying to do with GigaModular is simplify the deployment of liquid cooling at scale — especially for direct-to-chip,” Capes explains. “It’s not just about immersion anymore. The flexibility to support future AI workloads and grow from 2.5 MW to 10 MW of capacity in a modular way is absolutely critical.” GigaModular’s components — including IE5 pump modules, dual BPHx heat exchangers, and intelligent control systems —

Read More »

Oracle’s Global AI Infrastructure Strategy Takes Shape with Bloom Energy and Digital Realty

Bloom Energy: A Leading Force in On-Site Power As of mid‑2025, Bloom Energy has deployed over 400 MW of capacity at data centers worldwide, working with partners including Equinix, American Electric Power (AEP), and Quanta Computing. In total, Bloom has delivered more than 1.5 GW of power across 1,200+ global installations, a tripling of its customer base in recent years. Several key partnerships have driven this rapid adoption. A decade-long collaboration with Equinix, for instance, began with a 1 MW pilot in 2015 and has since expanded to more than 100 MW deployed across 19 IBX data centers in six U.S. states, providing supplemental power at scale. Even public utilities are leaning in: in late 2024, AEP signed a deal to procure up to 1 GW of Bloom’s solid oxide fuel cell (SOFC) systems for fast-track deployments aimed at large data centers and commercial users facing grid connection delays. More recently, on July 24, 2025, Bloom and Oracle Cloud Infrastructure (OCI) announced a strategic partnership to deploy SOFC systems at select U.S. Oracle data centers. The deployments are designed to support OCI’s gigawatt-scale AI infrastructure, delivering clean, uninterrupted electricity for high-density compute workloads. Bloom has committed to providing sufficient on-site power to fully support an entire data center within 90 days of contract signing. With scalable, modular, and low-emissions energy solutions, Bloom Energy has emerged as a key enabler of next-generation data center growth. Through its strategic partnerships with Oracle, Equinix, and AEP, and backed by a rapidly expanding global footprint, Bloom is well-positioned to meet the escalating demand for multi-gigawatt on-site generation as the AI era accelerates. Oracle and Digital Realty: Accelerating the AI Stack Oracle, which continues to trail hyperscale cloud providers like Google, AWS, and Microsoft in overall market share, is clearly betting big on AI to drive its next phase of infrastructure growth.

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »