AI lie detector: How HallOumi’s open-source approach to hallucination could unlock enterprise AI adoption

Stay Ahead, Stay ONMINE

AI lie detector: How HallOumi’s open-source approach to hallucination could unlock enterprise AI adoption

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More In the race to deploy enterprise AI, one obstacle consistently blocks the path: hallucinations. These fabricated responses from AI systems have caused everything from legal sanctions for attorneys to companies being forced to honor fictitious policies. […]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

In the race to deploy enterprise AI, one obstacle consistently blocks the path: hallucinations. These fabricated responses from AI systems have caused everything from legal sanctions for attorneys to companies being forced to honor fictitious policies.

Organizations have tried different approaches to solving the hallucination challenge, including fine-tuning with better data, retrieval augmented generation (RAG), and guardrails. Open-source development firm Oumi is now offering a new approach, albeit with a somewhat ‘cheesy’ name.

The company’s name is an acronym for Open Universal Machine Intelligence (Oumi). It is led by ex-Apple and Google engineers on a mission to build an unconditionally open-source AI platform.

On April 2, the company released HallOumi, an open-source claim verification model designed to solve the accuracy problem through a novel approach to hallucination detection. Halloumi is, of course, a type of hard cheese, but that has nothing to do with the model’s naming. The name is a combination of Hallucination and Oumi, though the timing of the release close to April Fools’ Day might have made some suspect the release was a joke – but it is anything but a joke; it’s a solution to a very real problem.

“Hallucinations are frequently cited as one of the most critical challenges in deploying generative models,” Manos Koukoumidis, CEO of Oumi, told VentureBeat. “It ultimately boils down to a matter of trust—generative models are trained to produce outputs which are probabilistically likely, but not necessarily true.”

How HallOumi works to solve enterprise AI hallucinations

HallOumi analyzes AI-generated content on a sentence-by-sentence basis. The system accepts both a source document and an AI response, then determines whether the source material supports each claim in the response.

“What HallOumi does is analyze every single sentence independently,” Koukoumidis explained. “For each sentence it analyzes, it tells you the specific sentences in the input document that you should check, so you don’t need to read the whole document to verify if what the [large language model] LLM said is accurate or not.”

The model provides three key outputs for each analyzed sentence:

A confidence score indicating the likelihood of hallucination.
Specific citations linking claims to supporting evidence.
A human-readable explanation detailing why the claim is supported or unsupported.

“We have trained it to be very nuanced,” said Koukoumidis. “Even for our linguists, when the model flags something as a hallucination, we initially think it looks correct. Then when you look at the rationale, HallOumi points out exactly the nuanced reason why it’s a hallucination—why the model was making some sort of assumption, or why it’s inaccurate in a very nuanced way.”

Integrating HallOumi into Enterprise AI workflows

There are several ways that HallOumi can be used and integrated with enterprise AI today.

One option is to try out the model using a somewhat manual process, though the online demo interface.

An API-driven approach will be more optimal for production and enterprise AI workflows. Manos explained that the model is fully open-source and can be plugged into existing workflows, run locally or in the cloud and used with any LLM.

The process involves feeding the original context and the LLM’s response to HallOumi, which then verifies the output. Enterprises can integrate HallOumi to add a verification layer to their AI systems, helping to detect and prevent hallucinations in AI-generated content.

Oumi has released two versions: the generative 8B model that provides detailed analysis and a classifier model that delivers only a score but with greater computational efficiency.

HallOumi vs RAG vs Guardrails for enterprise AI hallucination protection

What sets HallOumi apart from other grounding approaches is how it complements rather than replaces existing techniques like RAG (retrieval augmented generation) while offering more detailed analysis than typical guardrails.

“The input document that you feed through the LLM could be RAG,” Koukoumidis said. “In some other cases, it’s not precisely RAG, because people say, ‘I’m not retrieving anything. I already have the document I care about. I’m telling you, that’s the document I care about. Summarize it for me.’ So HallOumi can apply to RAG but not just RAG scenarios.”

This distinction is important because while RAG aims to improve generation by providing relevant context, HallOumi verifies the output after generation regardless of how that context was obtained.

Compared to guardrails, HallOumi provides more than binary verification. Its sentence-level analysis with confidence scores and explanations gives users a detailed understanding of where and how hallucinations occur.

HallOumi incorporates a specialized form of reasoning in its approach.

“There was definitely a variant of reasoning that we did to synthesize the data,” Koukoumidis explained. “We guided the model to reason step-by-step or claim by sub-claim, to think through how it should classify a bigger claim or a bigger sentence to make the prediction.”

The model can also detect not just accidental hallucinations but intentional misinformation. In one demonstration, Koukoumidis showed how HallOumi identified when DeepSeek’s model ignored provided Wikipedia content and instead generated propaganda-like content about China’s COVID-19 response.

What this means for enterprise AI adoption

For enterprises looking to lead the way in AI adoption, HallOumi offers a potentially crucial tool for safely deploying generative AI systems in production environments.

“I really hope this unblocks many scenarios,” Koukoumidis said. “Many enterprises can’t trust their models because existing implementations weren’t very ergonomic or efficient. I hope HallOumi enables them to trust their LLMs because they now have something to instill the confidence they need.”

For enterprises on a slower AI adoption curve, HallOumi’s open-source nature means they can experiment with the technology now while Oumi offers commercial support options as needed.

“If any companies want to better customize HallOumi to their domain, or have some specific commercial way they should use it, we’re always very happy to help them develop the solution,” Koukoumidis added.

As AI systems continue to advance, tools like HallOumi may become standard components of enterprise AI stacks—essential infrastructure for separating AI fact from fiction.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Ivanti warns customers of new critical flaw exploited in the wild

“The vulnerability is a buffer overflow with a limited character space, and therefore it was initially believed to be a low-risk denial-of-service vulnerability,” incident responders from Google-owned Mandiant wrote in a report on the flaw. “We assess it is likely the threat actor studied the patch for the vulnerability in

A look back at Microsoft’s IPO

Speaking of good fortune, Fortune magazine was granted inside access to Gates, his executive and legal teams, and their Wall Street partners in the months leading up to the IPO. That arrangement resulted in a terrific fly-on-the-wall story published four months later. A few highlights gleaned from that story and

ServiceNow to acquire Logik.ai to boost CRM portfolio

“With CPQ more seamlessly embedded into the sales and order management capabilities, sellers can increase productivity by exponentially reducing time towards building sales quotes and recording opportunities in the system. But also, as the system learns, it can also recommend the right products and services to add to a particular

Mechanical Orchard taps gen AI for mainframe-to-cloud modernization

Using Gen AI in mainframe modernization Industry research backs the ides that generative AI can address gaps in the mainframe modernization process. According to a recent report on mainframe trends by Information Services Group (ISG), the use of gen AI can accelerate migrations and mitigate the risks associated with mainframe

Brookfield to Buy Colonial Pipeline Owner in $9B Deal

A group of investors led by Brookfield Infrastructure Partners LP agreed to acquire Colonial Enterprises Inc. in a deal that values the operator of the biggest US fuel pipeline at about $9 billion. Colonial’s five owners are selling their entire stakes to Brookfield, including a Shell plc unit that will transfer its roughly 16 percent interest for $1.45 billion, according to a statement Friday. Colonial Pipeline operates one of the most important fuel conduits in the US, hauling more than 100 million gallons (2.5 million barrels) of fuel a day from Gulf Coast refineries to the Northeast. It was shut down for five days in 2021 after a cyberattack, leading to fuel shortages across the region. The deal to buy it comes as a glacial federal permitting process and political opposition continue to make building new pipelines in the US exceedingly difficult – despite US President Donald Trump’s push to expand domestic energy infrastructure. Shares of Brookfield Infrastructure Partners fell 4.6 percent in New York Friday amid the broader market sell off. Shell’s Midstream Operating unit will sell its stake to a Brookfield unit called Colossus AcquireCo. Colonial’s other owners are the industrial conglomerate Koch Inc., with 28.1 percent, a unit of private equity firm KKR & Co., with 23.4 percent, Canadian pension fund Caisse de Depot et Placement du Quebec, with 16.5 percent, and infrastructure owner IFM Investors Pty with a 15.8 percent share. Brookfield has already invested in global pipeline assets. It owns a controlling stake in Brazil’s NTS pipeline that spans more than 2,000 kilometers. The asset manager was also part of a consortium that bought a $10.1 billion stake in Abu Dhabi’s natural-gas pipelines in 2020. Colonial is in the midst of a fight with oil majors and trading houses including Exxon Mobil Corp. and Trafigura that ship fuels along its

How FERC is working to boost power supplies while managing the price tag

Amy Akers is a senior counsel at Clark Hill. Throughout the last five years the Federal Energy Regulatory Commission has made significant efforts to increase power supply while managing the consumer price tag through transmission reform. Beginning in June 2021, under docket RM21-17, FERC invited public comment on three specific topics (1) reforms for longer-term regional transmission planning and cost-allocation processes that take into account anticipated future generation, (2) rethinking cost responsibility for regional transmission facilities and interconnection-related network upgrades, and (3) enhanced transmission oversight over how new transmission facilities are identified and paid for. In April 2022, after resounding industry support, in the same docket, FERC issued a Notice of Proposed Rulemaking regarding the resiliency and reliability of the electric grid through regional transmission planning and cost allocation, and generator interconnection. However, motivated by the excessive interconnection queue in 2022 of over 2,000 GW of potential generation and storage capacity across the nation represented by over 10,000 active interconnection requests, FERC issued a NOPR in June 2022 under docket RM22-14 on the single question of reforming and streamlining generation interconnection processes and procedures. FERC approves transmission, interconnection reforms Maintaining its momentum of reducing the interconnection backlog, FERC issued Order 2023 in July 2023, which among other requirements, ensures interconnection access to new technologies. Nearly a year later, in May 2024, under docket RM21-17, FERC issued Order 1920 requiring transmission providers to conduct long-term planning over a 20-year horizon for future regional transmission facilities including the cost-effective expansion of transmission being replaced and determining how it will all be paid for. As is common practice, FERC issued rehearing orders for Orders 2023 and 1920 respectively titling them 2023-A and 1920-A, each largely providing clarifications to the original orders. New initiatives Continuing its effort to improve grid efficiency, FERC issued an advance

Norway State Fund Invests in RWE Wind Projects Offshore Denmark, Germany

Norges Bank Investment Management (NBIM), the fund management arm of Norway’s central bank, has signed an agreement with RWE to acquire a 49 percent stake in the German utility’s Nordseecluster and Thor offshore wind projects in Germany and Denmark respectively. The purchase price is EUR 1.4 billion ($1.55 billion). RWE will retain 51 percent. “RWE remains in charge of construction and operations throughout the lifecycle of these offshore wind farms”, RWE said in an online statement. Both projects are under construction. Their combined capacity will be enough to power over 2.6 million households in Germany and Denmark according to RWE. Nordseecluster, about 50 kilometers (31.07 miles) north of Juist island on Germany’s side of the North Sea, will have a capacity of 1.6 gigawatts (GW). It is being built in 2 phases. The 660-megawatt (MW) phase 1 is planned to be commissioned 2027. The 900-MW phase 2 is expected to be commissioned 2029. Thor, around 22 kilometers off the west coast of the Jutland peninsula, will produce up to 1.1 GW. It will be Denmark’s biggest offshore wind farm according to RWE. Commissioning is targeted 2027. “The projects will have long-term contracted revenues that provide stable cash flows and reduce risk to the projects”, RWE said. NBIM’s investment will raise the value of the projects to approximately EUR 2.87 billion. “In total, Norges Bank Investment Management’s expected commitment to acquire and fund its share of constructing the wind farms will be approximately 4,000 million euros”, NBIM said separately. “No external debt financing will be involved in the transaction”. The parties expect to complete the transaction in the third quarter, subject to customary approvals. NBIM is tasked with managing Norway’s oil and gas revenue through the Government Pension Fund Global. Last year renewable energy comprised 0.1 percent of NBIM’s investments. It

INEOS Completes Acquisition of CNOOC Assets in US Gulf

The INEOS Group has completed its purchase of China National Offshore Oil Corp.’s (CNOOC) stakes on the United States side of the Gulf of Mexico. The acquisition consisted of non-operating stakes in deepwater early-production projects Appomattox and Stampede, as well as “several mature assets and supporting business”, according to INEOS. CNOOC held a 25 percent stake in Stampede, operated by Hess Corp., and 21 percent in Appomattox, operated by INEOS’ fellow British company Shell PLC. The new assets raise the global production of INEOS’ energy arm to over 90,000 barrels of oil equivalent a day (boed), according to diversified company INEOS. “The USA is a very attractive place for INEOS Energy to invest”, INEOS Energy chief executive David Bucknall said in a statement. “This is our third deal in three years following the 1.4 mtpa LNG deal with Sempra and the acquisition of Chesapeake Energy’s oil and gas assets in South Texas. “Total capital spend on energy assets in the USA now exceeds $3 billion, providing a strong platform for future growth”. INEOS Energy chair Brian Gilvary commented, “This is a major step for us into the deepwater US Gulf, which builds on our growing energy business”. “INEOS Energy is all about competing in the energy transition to provide reliable, affordable energy to meet world demand as the population continues to grow – and progressing carbon storage projects”, Gilvary added. For CNOOC, the divestments will help it “optimize” its global portfolio, CNOOC International Ltd. chair Liu Yongjie said in a statement December 14, 2024, announcing the transaction agreement. Early last year Shell put into production a subsea tie-back to the Appomattox floating production hub, adding an estimated peak production of 16,000 boed. Located in the Mississippi Canyon, the Rydberg project has estimated proven and probable reserves of 38 million boe, according

LNG Cargoes Land at Wider Discounts in Europe

It’s getting cheaper to bring a shipment of liquefied natural gas into Europe because of heightening competition between terminals to accommodate extra cargoes. The delivered price of LNG for northwest Europe widened its discount to the continental benchmark Title Transfer Facility in recent weeks, according to data from Spark Commodities Pte Ltd. The price difference was as much as minus 71.5 cents last week, according to the data. Imports in western Europe reached their highest level for March in records going back to 2017, according to ship-tracking data compiled by Bloomberg. That’s happening as demand in Asia weakens, most noticeably in China, and Europe prepares to refill depleted inventories during the summer. Greece’s Public Power Corp. SA last week bought an LNG cargo on a DES basis for May delivery at roughly a 70-cent discount to the TTF benchmark. The widening difference demonstrates an increase in demand for delivery slots at European terminals, said Qasim Afghan, a commercial analyst at Spark. As a result, all those facilities are now in the money, he said. What do you think? We’d love to hear from you, join the conversation on the Rigzone Energy Network. The Rigzone Energy Network is a new social experience created for you and all energy professionals to Speak Up about our industry, share knowledge, connect with peers and industry insiders and engage in a professional community that will empower your career in energy. MORE FROM THIS AUTHOR Bloomberg

Power Moves: rahd.AI’s new chief operating officer and more

Innes Grant has been appointed as the chief operating officer for decommissioning tech company rahd.AI as it looks to scale up its operations in Aberdeen. Grant has previously worked in a similar role for 24 years with digital consulting giant Avanade, a joint venture between Microsoft and Accenture. Founded in Perth, Western Australia, the Scotland-based company uses artificial intelligence in its platform, which aims to reduce the cost of decommissioning oil and gas infrastructure. The group has successfully completed pilots in both Australia and the UK. Grant said: “We want to become the global default for facilitating decommissioning – the same way Skyscanner became the default for travel. And in doing so, we can save governments, operators and ultimately taxpayers tens of billions of pounds. “Earlier this year, Prime Minister Sir Keir Starmer posed the question on whether the UK wanted to be an AI taker or an AI maker. There is a global race for jobs of tomorrow and we can anchor many of these jobs in Aberdeen as we scale up this game-changing technology. “Our platform not only aligns with the UK government’s AI ambition, it also helps it tackle one of its most expensive industrial problems – reducing the cost of oil and gas decommissioning.” rahd.AI is a portfolio company of Ventex, the Aberdeen-based climate tech venture studio led by Steve Gray and Stuart McLeod, who both have a track record of success in building global businesses. The business is led by energy tech specialist Jake Stride, a former global technology strategist for Microsoft and current board member of Subsea Energy Australia. © Supplied by VattenfallVattenfall head of business area wind Helene Biström. Helene Biström has stepped down as Vattenfall’s head of business area wind. Having served in the role since 2021, Biström will remain in her role

New Intel CEO Lip-Bu Tan begins to lay out technology roadmap

He said that in the past, Intel designed hardware, then partners had to figure out developing the software to make it work. “The world has changed. You have to flip that around. Going forward, we will start with the problem, what you’re trying to solve, and the workloads you need to handle enable. Then we work backward from that, that require embrace the software 2.0 mentality, which means that having a software-first design,” said Tan. Analysts in attendance liked what they heard, even if it was limited in specificity. “What was clear to me was Tan will be focused on eliminating distractions, investing in talent and making sure the company has a more compelling roadmap to compete in the AI data center race,” said Daniel Newman, CEO of The Futurum Group. He said there was a cautious optimism evident at the event as the certainty of its new leadership provided a boost for its partners and employees. “However, there are still more questions than answers, and that should be expected, given his recent arrival and clear philosophy about what needs to come next, which in many ways starkly contrasted what came before,” said Newman. Bob O’Donnell, president and chief analyst with TECHnalysis Research, said the strategy that Tan discussed at his keynote isn’t really much different than those described by his predecessor: build great products and a great foundry business. “That’s not necessarily a bad thing, though, because I believe they’re ultimately the right things for the company to pursue. The difference is that Lip-Bu seemed more willing to tackle the challenge of right-sizing Intel and mentioned cutting things that aren’t core to the business. The big unanswered question is, however, what does he consider those areas/products to be so, as always, the devil is in the details,” he said.

Tariff war throws building of data centers into disarray

Forrester’s bottom line? “Because of the long term planning and all of the potential policy changes, I wouldn’t change my data center plans that much,” Nguyen said. Confusion reigns Every day it seems, the tariff situation becomes muddier. For example, according to a fact sheet released Wednesday, the White House has temporarily exempted semiconductors from tariffs, but not the aluminum used to build the servers and racks that house them. Furthermore, Scott Bickley, advisory fellow at the Info-Tech Research Group, said it is important to note how the various countries match with the various components. “Just about every major cost center for the buildout of a data center will be severely impacted by the new tariffs. Servers and hardware, including semiconductors, memory, network components, cabling, construction materials are going to see prices rise overnight once the tariffs go into effect,” Bickley said. “Consider that China, which has a 54% full tariff, is a major source of raw materials and rare earth elements essential for manufacturing DC components while Taiwan, at a 32% tariff rate, is the sole-source provider country for most advanced chipsets used in AI, cell phones, and any modern application footprint requiring high performance in a small footprint. South Korea (25% tariff) is a key provider of memory chips, while Japan (24%), Germany (20% EU rate), and the Netherlands (20% EU rate) are providers of sub-components like server racks, cooling systems, and semiconductor equipment.” But, he continued: “Now factor in the offshore/nearshore contract manufacturers like Mexico and Vietnam (46%) for electronics manufacturing (assembly and distribution) and Malaysia (10%) for semiconductor packaging, and it is clear to see that the complete technology supply chain leading into the data center will be taxed at multiple touchpoints.” Put all of that together and Info-Tech anticipates a lot of enterprise data center pain.

New MLCommons benchmarks to test AI infrastructure performance

The latest release also broadens its scope beyond chatbot benchmarks. A new graph neural network (GNN) test targets datacenter-class hardware and is designed for workloads like fraud detection, recommendation engines, and knowledge graphs. It uses the RGAT model based on a graph dataset containing over 547 million nodes and 5.8 billion edges. Judging performance Analysts suggest that these benchmarks will make it easier to judge the performance of various hardware chips and clusters based on documented models. “As every chipmaker seeks to prove that its hardware is good enough to support AI, we now have a standard benchmark that shows the quality of question support, math, and coding skills associated with hardware,” said Hyoun Park, CEO and Chief Analyst at Amalgam Insights. Chipmakers can now compete not just on traditional speeds and feeds, but in mathematical skill and informational accuracy. This benchmark provides a rare opportunity to add new performance standards on cross-vendor hardware, Park added. “The latency in terms of how quickly tokens are delivered and the time for the user to see the response is the deciding factor,” said Neil Shah, partner and co-founder at Counterpoint Research. “This is where players such as NVIDIA, AMD, and Intel have to get the software right to help developers optimize the models and bring out the best compute performance.” Benchmarking and buying decisions Independent benchmarks like those from MLCommons play a key role in helping buyers evaluate system performance, but relying on them alone may not provide the full picture.

Potential Nvidia chip shortage looms as Chinese customers rush to beat US sales ban

Will it lead to shortages? The US first placed export controls on chips sent to China in October 2022 as a means to slow the country’s technological advances. It blocked the sale of Nvidia’s A100 and H100 chips, leading the company to develop the less powerful A800 and H800 chips for the market; they were also subsequently banned. There was a surge in demand for the H20 following the arrival of Chinese startup DeepSeek’s ultra low-cost, open-source AI model in January. And while the H20 is reported to be 15 times slower than Nvidia’s newest Blackwell chips sold elsewhere in the world, it was designed specifically by Nvidia to comply with the further US export controls introduced in October 2023. It is being used by Chinese companies for training, although it’s billed as an inference chip, explained Matt Kimball, VP and principal analyst for datacenter compute and storage at Moor Insights & Strategy. Should Nvidia choose to focus its efforts on manufacturing more of the chips, Kimball said he doesn’t think it will impact supply in the US and Europe, as Blackwell is the main product sold in those markets and H20 is an N-1 Hopper architecture chip. “If you take this a step further and ask whether this large order slows down the production of chips destined for the US and Europe, I’d say the answer is no, as the Hopper family is built on a different process node than the Blackwell family,” he said. Still, Kimball noted, “supply chain management is difficult, especially for smaller organizations that are put to the back of the line as hyperscalers with multibillion dollar orders are first in line for the newest [chips].”

European cloud group invests to create what it dubs “Trump-proof cloud services”

But analysts have questioned whether the Microsoft move truly addresses those European business concerns. Phil Brunkard, executive counselor at Info-Tech Research Group UK, said, commenting on last month’s announcement of the EU Data Boundary for the Microsoft Cloud, “Microsoft says that customer data will remain stored and processed in the EU and EFTA, but doesn’t guarantee true data sovereignty.” And European companies are now rethinking what data sovereignty means to them. They are moving beyond having it refer to where the data sits to focusing on which vendors control it, and who controls them. Responding to the new Euro cloud plan, another analyst, IDC VP Dave McCarthy, saw the effort as “signaling a growing European push for data control and independence.” “US providers could face tougher competition from EU companies that leverage this tech to offer sovereignty-friendly alternatives. Although €1 million isn’t a game-changer on its own, it’s a clear sign Europe wants to build its own cloud ecosystem—potentially at the expense of US market share,” McCarthy said. “For US providers, this could mean investing in more EU-based data centers or reconfiguring systems to ensure European customers’ data stays within the region. This isn’t just a compliance checkbox. It’s a shift that could hike operational costs and complexity, especially for companies used to running centralized setups.” Adding to the potential bad news for US hyperscalers, McCarthy said that there was little reason to believe that this trend would be limited to Europe. “If Europe pulls this off, other regions might take note and push for similar sovereignty rules. US providers could find themselves adapting to a patchwork of regulations worldwide, forcing a rethink of their global strategies,” McCarthy said. “This isn’t just a European headache, it’s a preview of what could become a broader challenge.”

Talent gap complicates cost-conscious cloud planning

The top strategy so far is what one enterprise calls the “Cloud Team.” You assemble all your people with cloud skills, and your own best software architect, and have the team examine current and proposed cloud applications, looking for a high-level approach that meets business goals. In this process, the team tries to avoid implementation specifics, focusing instead on the notion that a hybrid application has an agile cloud side and a governance-and-sovereignty data center side, and what has to be done is push functionality into the right place. The Cloud Team supporters say that an experienced application architect can deal with the cloud in abstract, without detailed knowledge of cloud tools and costs. For example, the architect can assess the value of using an event-driven versus transactional model without fixating on how either could be done. The idea is to first come up with approaches. Then, developers could work with cloud providers to map each approach to an implementation, and assess the costs, benefits, and risks. Ok, I lied about this being the top strategy—sort of, at least. It’s the only strategy that’s making much sense. The enterprises all start their cloud-reassessment journey on a different tack, but they agree it doesn’t work. The knee-jerk approach to cloud costs is to attack the implementation, not the design. What cloud features did you pick? Could you find ones that cost less? Could you perhaps shed all the special features and just host containers or VMs with no web services at all? Enterprises who try this, meaning almost all of them, report that they save less than 15% on cloud costs, a rate of savings that means roughly a five-year payback on the costs of making the application changes…if they can make them at all. Enterprises used to build all of

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE