Stay Ahead, Stay ONMINE

Google DeepMind researchers introduce new benchmark to improve LLM factuality, reduce hallucinations

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Hallucinations, or factually inaccurate responses, continue to plague large language models (LLMs). Models falter particularly when they are given more complex tasks and when users are looking for specific and highly detailed responses.  It’s a challenge […]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Hallucinations, or factually inaccurate responses, continue to plague large language models (LLMs). Models falter particularly when they are given more complex tasks and when users are looking for specific and highly detailed responses. 

It’s a challenge data scientists have struggled to overcome, and now, researchers from Google DeepMind say they have come a step closer to achieving true factuality in foundation models. They have introduced FACTS Grounding, a benchmark that evaluates LLMs’ ability to generate factually accurate responses based on long-form documents. Models are also judged on whether their responses are detailed enough to provide useful, relevant answers to prompts. 

Along with the new benchmark, the researchers have released a FACTS leaderboard to the Kaggle data science community. 

As of this week, Gemini 2.0 Flash topped the leaderboard, with a factuality score of 83.6%. Others in the top 9 include Google’s Gemini 1.0 Flash and Gemini 1.5 Pro; Anthropic’s Clade 3.5 Sonnet and Claude 3.5 Haiku; and OpenAI’s GPT-4o, 4o-mini, o1-mini and o1-preview. These all ranked above 61.7% in terms of accuracy.

The researchers say the leaderboard will be actively maintained and continually updated to include new models and their different iterations. 

“We believe that this benchmark fills a gap in evaluating a wider variety of model behaviors pertaining to factuality, in comparison to benchmarks that focus on narrower use cases…such as summarization alone,” the researchers write in a technical paper published this week.

Weeding out inaccurate responses

Ensuring factual accuracy in LLM responses is difficult because of modeling (architecture, training and inference) and measuring (evaluation methodologies, data and metrics) factors. Typically, researchers point out, pre-training focuses on predicting the next token given previous tokens. 

“While this objective may teach models salient world knowledge, it does not directly optimize the model towards the various factuality scenarios, instead encouraging the model to generate generally plausible text,” the researchers write. 

To address this, the FACTS dataset incorporates 1,719 examples — 860 public and 859 private — each requiring long-form responses based on context in provided documents. Each example includes: 

  • A system prompt (system_instruction) with general directives and the order to only answer based on provided context;
  • A task (user_request) that includes a specific question to be answered; 
  • A long document (context_document) with necessary information. 

To succeed and be labeled “accurate,” the model must process the long-form document and create a subsequent long-form response that is both comprehensive and fully attributable to the document. Responses are labeled “inaccurate” if the model’s claims are not directly supported by the document and not highly relevant or useful. 

For example, a user may ask a model to summarize the main reasons why a company’s revenue decreased in Q3, and provide it with detailed information including a company’s annual financial report discussing quarterly earnings, expenses, planned investments and market analysis. 

If a model then, say, returned: “The company faced challenges in Q3 that impacted its revenue,” it would be deemed inaccurate. 

“The response avoids specifying any reasons, such as market trends, increased competition or operational setbacks, which would likely be in the document,” the researchers point out. “It doesn’t demonstrate an attempt to engage with or extract relevant details.” 

By contrast, if a user prompted, “What are some tips on saving money?” and provided a compilation of categorized money-saving tips for college students, a correct response would be highly detailed: “Utilize free activities on campus, buy items in bulk and cook at home. Also, set spending goals, avoid credit cards and conserve resources.” 

DeepMind uses LLMs to judge LLMs

To allow for diverse inputs, researchers included documents of varying lengths, up to 32,000 tokens (or the equivalent of 20,000 words). These cover areas including finance, technology, retail, medicine and law. User requests are also broad, including Q&A generation, requests for summarization and rewriting. 

Each example is judged in two phases. First, responses are evaluated for eligibility: If they don’t satisfy user requests, they are disqualified. Second, responses must be hallucination-free and fully grounded in the documents provided.

These factuality scores are calculated by three different LLM judges — specifically Gemini 1.5 Pro, GPT-4o and Claude 3.5 Sonnet — that determine individual scores based on the percentage of accurate model outputs. Subsequently, the final factuality determination is based on an average of the three judges’ scores.

Researchers point out that models are often biased towards other members of their model family — at a mean increase of around 3.23% — so the combination of different judges was critical to help ensure responses were indeed factual.

Ultimately, the researchers emphasize that factuality and grounding are key factors to the future success and usefulness of LLMs. “We believe that comprehensive benchmarking methods, coupled with continuous research and development, will continue to improve AI systems,” they write. 

However, they also concede: “We are mindful that benchmarks can be quickly overtaken by progress, so this launch of our FACTS Grounding benchmark and leaderboard is just the beginning.” 

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

IBM, Arm team up to bring Arm software to IBM Z mainframes

That gap is precisely what the collaboration is intended to address, said Rachita Rao, senior analyst at Everest Group. “This is a mainframe adjacency play,” she said. “The intent is to extend IBM Z and LinuxONE environments by enabling Arm-compatible workloads to run closer to systems of record. While hyperscalers

Read More »

Energy Department Initiates Additional Strategic Petroleum Reserve Emergency Exchange to Stabilize Global Oil Supply

WASHINGTON—The U.S. Department of Energy (DOE) issued a Request for Proposal (RFP) today for an emergency exchange of 10-million-barrels from the Strategic Petroleum Reserve (SPR). This action is part of the coordinated release of 400-million-barrels from IEA member nations’ strategic reserves President Trump previously announced. The United States continues to deliver on its 172-million-barrel release commitment.  The crude oil will originate from the Strategic Petroleum Reserve’s (SPR) Bryan Mound site. Today’s action builds on the initial phase of the Emergency Exchange, which moved quickly to award 45.2 million barrels from the Bayou Choctaw, Bryan Mound, and West Hackberry SPR sites. The 10-million-barrel exchange leverages the full capabilities of the SPR, alongside the President’s limited Jones Act waiver, to accelerate critical near-term oil flows into the market.  “Today’s action furthers the United States’ efforts to move oil quickly to the market and mitigate short-term supply disruptions,” said DOE Assistant Secretary of the Hydrocarbons and Geothermal Energy Office Kyle Haustveit. “Thanks to President Trump, America is managing our national security assets responsibly again. Through this exchange, we will continue to refill the Strategic Petroleum Reserve by bringing additional barrels back at a later date through this pragmatic exchange structure, strengthening its long-term readiness and all at no cost to the American taxpayer.”  Under DOE’s exchange authority, participating companies will return the borrowed 10 million barrels with additional premium barrels by next year. This exchange delivers immediate crude to refiners and the market while generating additional barrels for the American people at no cost to taxpayers.   Bids for the solicitation are due no later than 11:00 A.M. CT on Monday, April 6, 2026.    For more information on the SPR, please visit DOE’s website.   

Read More »

Trump Administration Keeps Colorado Coal Plant Open to Ensure Affordable, Reliable and Secure Power in Colorado

WASHINGTON—U.S. Secretary of Energy Chris Wright today issued an emergency order to keep a Colorado coal plant operational to ensure Americans maintain access to affordable, reliable and secure electricity. The order directs Tri-State Generation and Transmission Association (Tri-State), Platte River Power Authority, Salt River Project, PacifiCorp, and Public Service Company of Colorado (Xcel Energy), in coordination with the Western Area Power Administration (WAPA) Rocky Mountain Region and Southwest Power Pool (SPP), to take all measures necessary to ensure that Unit 1 at the Craig Station in Craig, Colorado is available to operate. Unit One of the coal plant was scheduled to shut down at the end of 2025 but on December 30, 2025, Secretary Wright issued an emergency order directing Tri-State and the co-owners to ensure that Unit 1 at the Craig Station remains available to operate. “The last administration’s energy subtraction policies threatened America’s energy security and positioned our nation to likely experience significantly more blackouts in the coming years—thankfully, President Trump won’t let that happen,” said Energy Secretary Wright. “The Trump Administration will continue taking action to ensure we don’t lose critical generation sources. Americans deserve access to affordable, reliable, and secure energy to power their homes all the time, regardless of whether the wind is blowing or the sun is shining.” Thanks to President Trump’s leadership, coal plants across the country are reversing plans to shut down. In 2025, more than 17 gigawatts (GW) of coal-power electricity generation were saved. On April 1, once Tri-State and the WAPA Rocky Mountain Region join the SPP RTO West expansion, SPP is directed to take every step to employ economic dispatch to minimize costs to ratepayers. According to DOE’s Resource Adequacy Report, blackouts were on track to potentially increase 100 times by 2030 if the U.S. continued to take reliable

Read More »

NextDecade contractor Bechtel awards ABB more Rio Grande LNG automation work

NextDecade Corp. contractor Bechtel Corp. has awarded ABB Ltd. additional integrated automation and electrical solution orders, extending its scope to Trains 4 and 5 of NextDecade’s 30-million tonne/year (tpy)  Rio Grande LNG (RGLNG) plant in Brownsville, Tex. The orders were booked in third- and fourth-quarters 2025 and build on ABB’s Phase 1 work with Trains 1-3, totaling 17 million tpy.  The scope for RGLNG Trains 4 and 5 includes deployment of an integrated control and safety system consisting of a distributed control system, emergency shutdown, and fire and gas systems. An electrical controls and monitoring system will provide unified visibility of the plant’s electrical infrastructure. These two overarching solutions will provide a common automation platform. ABB will also supply medium-voltage drives, synchronous motors, transformers, motor controllers and switchgear.  The orders also include local equipment buildings—two for Train 4 and one for Train 5— housing critical control and electrical systems in prefabricated modules to streamline installation and commissioning on site. The solutions being delivered to Bechtel use ABB adaptive execution, a methodology for capital projects designed to optimize engineering work and reduce delivery timelines. Phase 1 of RGLNG is under construction and expected to begin operations in 2027. Operations at Train 4 are expected in 2030 and Train 5 in 2031. ABB’s senior vice-president for the Americas, Scott McCay, confirmed to Oil & Gas Journal at CERAWeek by S&P Global in Houston that the company is doing similar work through Tecnimont for Argent LNG’s planned 25-million tpy plant in Port Fourchon, La.; 10-million tpy Phase 1 and 15-million tpy Phase 2. Argent is targeting 2030 completion for its plant.

Read More »

Persistent oil flow imbalances drive Enverus to increase crude price forecast

Citing impacts from the Iran war, near-zero flows through the Strait of Hormuz, accelerating global stock draws, and expectations for a muted US production response despite higher prices, Enverus Intelligence Research (EIR) raised its Brent crude oil price forecast. EIR now expects Brent to average $95/bbl for the remainder of 2026 and $100/bbl in 2027, reflecting what it described as a persistent global oil flow imbalance that continues to draw down inventories. “The world has an oil flow problem that is draining stocks,” said Al Salazar, director of research at EIR. “Whenever that oil flow problem is resolved, the world is left with low stocks. That’s what drives our oil price outlook higher for longer.” The outlook assumes the Strait of Hormuz remains largely closed for 3 months. EIR estimates that each month of constrained flows shifts the price outlook by about $10–15/bbl, underscoring the scale of the disruption and uncertainty around its duration. Despite West Texas Intermediate (WTI) prices of $90–100/bbl, EIR does not expect US producers to materially increase output. The firm forecasts US liquids production growth of 370,000 b/d by end-2026 and 580,000 b/d by end-2027, citing drilling-to-production lags, industry consolidation, and continued capital discipline. Global oil demand growth for 2026 has been reduced to about 500,000 b/d from 1.0 million b/d as higher energy prices and anticipated supply disruptions weigh on economic activity. Cumulative global oil stock draws are estimated at roughly 1 billion bbl through 2027, with non-OECD inventories—particularly in Asia—absorbing nearly half of the impact. A 60-day Jones Act waiver may provide limited short-term US shipping flexibility, but EIR said the measure is unlikely to materially affect global oil prices given broader market forces.

Read More »

Equinor begins drilling $9-billion natural gas development project offshore Brazil

Equinor has started drilling the Raia natural gas project in the Campos basin presalt offshore Brazil. The $9-billion project is Equinor’s largest international investment, its largest project under execution, and marks the deepest water depth operation in its portfolio. The drilling campaign, which began Mar. 24 with the Valaris DS‑17 drillship, includes six wells in the Raia area 200 km offshore in water depths of around 2,900 m. The area is expected to hold recoverable natural gas and condensate reserves of over 1 billion boe. Raia’s development concept is based on production through wells connected to a 126,000-b/d floating production, storage and offloading unit (FPSO), which will treat produced oil/condensate and gas. Natural gas will be transported through a 200‑km pipeline from the FPSO to Cabiúnas, in the city of Macaé, Rio de Janeiro state. Once in operation, expected in 2028, the project will have the capacity to export up to 16 million cu m/day of natural gas, which could represent 15% of Brazil’s natural gas demand, the company said in a release Mar. 24. “While drilling takes place, integration and commissioning activities on the FPSO are progressing well putting us on track towards a safe start of operations in 2028,” said Geir Tungesvik, executive vice-president, projects, drilling and procurement, Equinor. The Raia project is operated by Equinor (35%), in partnership with Repsol Sinopec Brasil (35%) and Petrobras (30%).

Read More »

Woodfibre LNG receives additional modules as construction advances

Woodfibre LNG LP has received two major modules within a week for its under‑construction, 2.1‑million tonne/year (tpy) LNG export plant near Squamish, British Columbia, advancing construction to about 65% complete. The deliveries include the liquefaction module—the project’s heaviest and most critical process unit—and the powerhouse module, which will serve as the plant’s central power and control hub. The liquefaction module, delivered aboard the heavy cargo vessel Red Zed 1, is the 15th of 19 modules scheduled for installation at the site, the company said in a Mar. 24 release. Weighing about 10,847 metric tonnes and occupying a footprint roughly equivalent to a football field, it is among the largest modules fabricated for the project. Once installed and commissioned, the liquefaction module will cool natural gas to about –162°C, converting it into LNG for export. Shortly after the liquefaction module’s arrival, Woodfibre LNG received the powerhouse module, the 16th module delivered to site. Weighing more than 4,200 metric tonnes, the powerhouse module will function as a power and control system, receiving electricity from BC Hydro and managing and distributing power to the plant’s electric‑drive compressors. The Woodfibre LNG project is designed as the first LNG export plant to use electric‑drive motors for liquefaction, replacing conventional gas‑turbine‑driven compressors. The Siemens electric‑drive system will be powered by renewable hydroelectricity from BC Hydro, eliminating the largest operational source of greenhouse gas emissions typically associated with liquefaction, the company said. The project is being built near the community of Squamish on the traditional territory of the Sḵwx̱wú7mesh Úxwumixw (Squamish Nation) and is regulated in part by the Indigenous government.  All 19 modules are expected to arrive on site by spring 2026. Construction is scheduled for completion in 2027. Woodfibre LNG is owned by Woodfibre LNG Ltd. Partnership, which is 70% owned by Pacific Energy Corp.

Read More »

Data Center Jobs: Engineering, Construction, Commissioning, Sales, Field Service and Facility Tech Jobs Available in Major Data Center Hotspots

Each month Data Center Frontier, in partnership with Pkaza, posts some of the hottest data center career opportunities in the market. Here’s a look at some of the latest data center jobs posted on the Data Center Frontier jobs board, powered by Pkaza Critical Facilities Recruiting. Looking for Data Center Candidates? Check out Pkaza’s Active Candidate / Featured Candidate Hotlist Power Applications Engineer Pittsburgh, PA This position is also available in: Denver, CO and Andrews, SC.  Our client is a leading provider and manufacturer of industrial electrical power equipment used in industrial applications for mission critical operations. They help their customers save money by reducing energy and operating costs and provide solutions for modernizing their customer’s existing electrical infrastructure. This company provides cooling solutions to many of the world’s largest organizations and government facilities and enterprise clients, colocation providers and hyperscale companies. This career-growth minded opportunity offers exciting projects with leading-edge technology and innovation as well as competitive salaries and benefits. Electrical Commissioning Engineer Ashburn, VA This traveling position is also available in: New York, NY; White Plains, NY;  Dallas, TX; Richmond, VA; Montvale, NJ; Charlotte, NC; Atlanta, GA; Hampton, GA; New Albany, OH; Cedar Rapids, IA; Phoenix, AZ; Salt Lake City, UT;  Kansas City, MO; Omaha, NE; Chesterton, IN or Chicago, IL. *** ALSO looking for a LEAD EE and ME CxA Agents and CxA PMs. ***  Our client is an engineering design and commissioning company that has a national footprint and specializes in MEP critical facilities design. They provide design, commissioning, consulting and management expertise in the critical facilities space. They have a mindset to provide reliability, energy efficiency, sustainable design and LEED expertise when providing these consulting services for enterprise, colocation and hyperscale companies. This career-growth minded opportunity offers exciting projects with leading-edge technology and innovation as well as competitive

Read More »

No joke: data centers are warming the planet

The researchers also made use of a database provided by the International Energy Agency (IEA) that the authors pointed out contains more than 11,000 locations worldwide, of which 8,472 have been detected to dwell outside of highly dense urban areas. The latter locations were then used to “quantify the effect of data centers on the environment in terms of the LST gradient that could be measured on the areas surrounding each data center.” Asking the wrong question Asked if AI data centers are really causing local warming, or if this phenomenon is overstated, Sanchit Vir Gogia, chief analyst at Greyhound Research, said, “the signal is real, but the industry is asking the wrong question. The research shows a consistent rise in land surface temperature of around 2°C  following the establishment of large data centre facilities.” The debate, however, “has quickly shifted to causality: whether this is driven by operational heat from compute, or by land transformation during construction. That distinction matters scientifically, but it does not change the strategic implication.” Land surface temperature, said Gogia, is not the same as air temperature, and that gap will be used to challenge the findings. “But dismissing the signal on that basis would be a mistake,” he noted. “Data centers concentrate energy use, replace natural surfaces with heat-retaining materials, and continuously reject heat into the environment. Those are known drivers of thermal change.” He added, “the uncomfortable truth is this: Even if the exact mechanism is debated, the outcome aligns with first principles. Infrastructure at this scale alters its surroundings. The industry does not yet have a clean way to separate construction impact from operational impact, and that ambiguity makes the risk harder to model, not easier. This is not overstated, it is under-interpreted.” Location strategy must change But will the findings change

Read More »

Schneider Electric Maps the AI Data Center’s Next Design Era

The coming shift to higher-voltage DC That internal power challenge led Simonelli to one of the most consequential architectural topics in the interview: the likely transition toward higher-voltage DC distribution at very high rack densities. He framed it pragmatically. At current density levels, the industry knows how to get power into racks at 200 or 300 kilowatts. But as densities rise toward 400 kilowatts and beyond, conventional AC approaches start to run into physical limits. Too much cable, too much copper, too much conversion equipment, and too much space consumed by power infrastructure rather than GPUs. At that point, he said, higher-voltage DC becomes attractive not for philosophical reasons, but because it reduces current, shrinks conductor size, saves space, and leaves more room for revenue-generating compute. “It is again a paradigm shift,” Simonelli said of DC power at these densities. “But it won’t be everywhere.” That is probably right. The transition will not be universal, and the exact thresholds will evolve. But his underlying point is powerful. As rack densities climb, electrical architecture starts to matter not only for efficiency and reliability, but for physical space allocation inside the rack. Put differently, power distribution becomes a compute-enablement issue. Distance between accelerators matters, too. The closer GPUs and TPUs can be kept together, the better they perform. If power infrastructure can be compacted, more of the rack can be devoted to dense compute, improving the economics and performance of the system. That is a strong example of how AI is collapsing traditional boundaries between facility engineering and compute architecture. The two are no longer cleanly separable. Gas now, renewables over time On onsite power, Simonelli was refreshingly direct. If the goal is dispatchable onsite generation at the scale now being contemplated for AI facilities, he said, “there really isn’t an alternative

Read More »

SoftBank’s 10 GW Ohio Campus Marks a Turning Point for AI Infrastructure

Renewables can reduce carbon intensity, but they cannot independently meet the need for continuous, multi-gigawatt firm capacity without large-scale storage and balancing resources. For developers targeting guaranteed availability within this decade, natural gas remains the most readily deployable option, despite the political and environmental tradeoffs it introduces. AEP and the Cost Allocation Model If the generation plan explains the engineering logic, the AEP structure speaks to the political one. At the center is one of the most contested questions in the data center market: who pays for the transmission and grid upgrades required to serve large new loads? Utilities, regulators, consumer advocates, and large-load customers are increasingly divided on this issue. Data center developers point to economic development benefits, including jobs and tax revenue. Consumer advocates counter that residential ratepayers should not subsidize infrastructure built primarily to serve hyperscale demand. The Ohio arrangement is being positioned as a response to that conflict. DOE states that SB Energy and AEP Ohio are partnering on $4.2 billion in new transmission infrastructure, with SB Energy committing to fund those investments rather than passing costs through to ratepayers. AEP has echoed that position, indicating the structure is intended to avoid upward pressure on transmission rates for Ohio customers. Whether that outcome holds will depend on regulatory review and execution. But the structure itself is significant. It frames a model in which large-load developers directly fund the transmission infrastructure required to support their projects, rather than relying on broader cost recovery mechanisms. That makes the project more than a construction milestone. It positions it as a potential policy template. If validated, this approach could influence how utilities and regulators across the U.S. address cost allocation for AI-scale infrastructure, particularly as similar disputes intensify in constrained grid regions. Why 765-kV Transmission Signals Scale AEP says the

Read More »

Q1 Executive Roundtable Recap

Matt Vincent is Editor in Chief of Data Center Frontier, where he leads editorial strategy and coverage focused on the infrastructure powering cloud computing, artificial intelligence, and the digital economy. A veteran B2B technology journalist with more than two decades of experience, Vincent specializes in the intersection of data centers, power, cooling, and emerging AI-era infrastructure. Since assuming the EIC role in 2023, he has helped guide Data Center Frontier’s coverage of the industry’s transition into the gigawatt-scale AI era, with a focus on hyperscale development, behind-the-meter power strategies, liquid cooling architectures, and the evolving energy demands of high-density compute, while working closely with the Digital Infrastructure Group at Endeavor Business Media to expand the brand’s analytical and multimedia footprint. Vincent also hosts The Data Center Frontier Show podcast, where he interviews industry leaders across hyperscale, colocation, utilities, and the data center supply chain to examine the technologies and business models reshaping digital infrastructure. Since its inception he serves as Head of Content for the Data Center Frontier Trends Summit. Before becoming Editor in Chief, he served in multiple senior editorial roles across Endeavor Business Media’s digital infrastructure portfolio, with coverage spanning data centers and hyperscale infrastructure, structured cabling and networking, telecom and datacom, IP physical security, and wireless and Pro AV markets. He began his career in 2005 within PennWell’s Advanced Technology Division and later held senior editorial positions supporting brands such as Cabling Installation & Maintenance, Lightwave Online, Broadband Technology Report, and Smart Buildings Technology. Vincent is a frequent moderator, interviewer, and keynote speaker at industry events including the HPC Forum, where he delivers forward-looking analysis on how AI and high-performance computing are reshaping digital infrastructure. He graduated with honors from Indiana University Bloomington with a B.A. in English Literature and Creative Writing and lives in southern New Hampshire with

Read More »

Executive Roundtable: The AI Infrastructure Credibility Test

For the fourth installment of DCF’s Executive Roundtable for the First Quarter of 2026, we turn to a question that increasingly sits alongside power and capital as a defining constraint. Credibility. As AI-driven data center development accelerates, public scrutiny is rising in parallel. Communities, regulators, and policymakers are taking a closer look at the industry’s footprintin terms of its energy consumption, its land use, and its broader impact on local infrastructure and ratepayers. What was once a relatively low-profile sector has become a visible and, at times, contested presence in regional economies. This shift reflects the sheer scale of the current build cycle. Multi-hundred-megawatt and gigawatt campuses are no longer theoretical in any sense. They are actively being proposed and constructed across key markets. With that scale comes heightened expectations around transparency, accountability, and tangible community benefit. At the same time, the industry faces a more complex regulatory and political landscape. Questions around grid capacity, rate structures, environmental impact, and economic incentives are increasingly being debated in public forums, from state utility commissions to local zoning boards. In this environment, the ability to secure approvals is no longer assured, even in historically favorable markets. The concept of a “social license to operate” has therefore moved to the forefront. Beyond technical execution, developers and operators must now demonstrate that AI infrastructure can be deployed in a way that aligns with community priorities and delivers shared value. In this roundtable, our panel of industry leaders explores what will define that credibility in the years ahead and what the data center industry must do to sustain its momentum in an era of growing public scrutiny.

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »