Stay Ahead, Stay ONMINE

ByteDance’s UI-TARS can take over your computer, outperforms GPT-4o and Claude

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A new AI agent has emerged from the parent company of TikTok to take control of your computer and perform complex workflows. Much like Anthropic’s Computer Use, ByteDance’s new UI-TARS understands graphical user interfaces (GUIs), applies […]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


A new AI agent has emerged from the parent company of TikTok to take control of your computer and perform complex workflows.

Much like Anthropic’s Computer Use, ByteDance’s new UI-TARS understands graphical user interfaces (GUIs), applies reasoning and takes autonomous, step-by-step action. 

Trained on roughly 50B tokens and offered in 7B and 72B parameter versions, the PC/MacOS agents achieves state-of-the-art (SOTA) performance on 10-plus GUI benchmarks across performance, perception, grounding and overall agent capabilities, consistently beating out OpenAI’s GPT-4o, Claude and Google’s Gemini.

“Through iterative training and reflection tuning, UI-TARS continuously learns from its mistakes and adapts to unforeseen situations with minimal human intervention,” researchers from ByteDance and Tsinghua University write in a new research paper

Source: Arxiv

How UI-TARS explains its thinking

UI-TARS works across desktop, mobile and web applications, using multimodal inputs (text, images, interactions) to understand visual environments.

Its UI features two tabs — one to the left showing its step-by-step “thinking,” and a larger one to the right where it pulls up files, websites and apps and automatically takes action. 

For example, in a demo video released today, the model is prompted to “Find round trip flights from SEA to NYC on the 5th and return on the 10th next month and filter by price in ascending order.”

In response, UI-TARS navigates to the website for Delta Airlines, fills in the “from” and “to” fields, clicks in the relevant dates and sorts and filters by price, explaining each step in its thinking box before taking action. 

In another scenario, it is instructed to install an autoDocstring extension in VS Code. Here is its step-by-step thinking as it completes that task: 

  • It first reports that “to proceed with installing the autoDocstring extension in VS Code, I first need to open the VS Code application.” 
  • Recognizing that the VS Code window has not fully loaded, the model notes that “it is necessary to wait for the application to finish initializing before proceeding with the installation of the autoDocstring extension. This ensures that all required functionalities are available for the next steps.” 
  • It then explains that it must access the Extensions view in VS Code, which can be done by clicking on a tab in the left sidebar. 
  • Running into a small glitch, it reasons: “The previous attempt to click on the ‘Extensions’ tab may not have been precise enough, so I will try clicking on it again to ensure the Extensions view opens successfully.”
  • Once in the Extensions view, it types in ‘autoDocstring’ before concluding: “Since the installation process has already been initiated, there is no further action required at this moment. I will wait for the installation to complete and confirm its success once the process is finished.”

Outperforming its rivals

Across a variety of benchmarks, researchers report that UI-TARS consistently outranked OpenAI’s GPT-4o; Anthropic’s Claude-3.5-Sonnet; Gemini-1.5-Pro and Gemini-2.0; four Qwen models; and numerous academic models.

For instance, in VisualWebBench — which measures a model’s ability to ground web elements including webpage quality assurance and optical character recognition — UI-TARS 72B scored 82.8%, outperforming GPT-4o (78.5%) and Claude 3.5 (78.2%). 

It also did significantly better on WebSRC benchmarks (understanding of semantic content and layout in web contexts) and ScreenQA-short (comprehension of complex mobile screen layouts and web structure). UI-TARS-7B achieved leading scores of 93.6% on WebSRC, while UI-TARS-72B achieved 88.6% on ScreenQA-short, outperforming Qwen, Gemini, Claude 3.5 and GPT-4o. 

“These results demonstrate the superior perception and comprehension capabilities of UI-TARS in web and mobile environments,” the researchers write. “Such perceptual ability lays the foundation for agent tasks, where accurate environmental understanding is crucial for task execution and decision-making.”

UI-TARS also showed impressive results in ScreenSpot Pro and ScreenSpot v2 , which assess a model’s ability to understand and localize elements in GUIs. Further, researchers tested its capabilities in planning multi-step actions and low-level tasks in mobile environments, and benchmarked it on OSWorld (which assesses open-ended computer tasks) and AndroidWorld (which scores autonomous agents on 116 programmatic tasks across 20 mobile apps). 

Source: Arxiv
Source: Arxiv

Under the hood

To help it take step-by-step actions and recognize what it’s seeing, UI-TARS was trained on a large-scale dataset of screenshots that parsed metadata including element description and type, visual description, bounding boxes (position information), element function and text from various websites, applications and operating systems. This allows the model to provide a comprehensive, detailed description of a screenshot, capturing not only elements but spatial relationships and overall layout. 

The model also uses state transition captioning to identify and describe the differences between two consecutive screenshots and determine whether an action — such as a mouse click or keyboard input — has occurred. Meanwhile, set-of-mark (SoM) prompting allows it to overlay distinct marks (letters, numbers) on specific regions of an image. 

The model is equipped with both short-term and long-term memory to handle tasks at hand while also retaining historical interactions to improve later decision-making. Researchers trained the model to perform both System 1 (fast, automatic and intuitive) and System 2 (slow and deliberate) reasoning. This allows for multi-step decision-making, “reflection” thinking, milestone recognition and error correction. 

Researchers emphasized that it is critical that the model be able to maintain consistent goals and engage in trial and error to hypothesize, test and evaluate potential actions before completing a task. They introduced two types of data to support this: error correction and post-reflection data. For error correction, they identified mistakes and labeled corrective actions; for post-reflection, they simulated recovery steps. 

“This strategy ensures that the agent not only learns to avoid errors but also adapts dynamically when they occur,” the researchers write.

Clearly, UI-TARS exhibits impressive capabilities, and it’ll be interesting to see its evolving use cases in the increasingly competitive AI agents space. As the researchers note: “Looking ahead, while native agents represent a significant leap forward, the future lies in the integration of active and lifelong learning, where agents autonomously drive their own learning through continuous, real-world interactions.”

Researchers point out that Claude Computer Use “performs strongly in web-based tasks but significantly struggles with mobile scenarios, indicating that the GUI operation ability of Claude has not been well transferred to the mobile domain.” 

By contrast, “UI-TARS exhibits excellent performance in both website and mobile domain.” 

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

HPE, Nvidia expand AI partnership

In addition, the company announced the HPE Cray Supercomputing GX240 liquid-cooled compute blade for its GX5000 platform. The GX240 starts with 16 Nvidia Vera CPUs per blade and scales to 40 blades per rack, supporting up to 640 Nvidia Vera CPUs and 56,320 ARM cores per rack. In addition, HPE

Read More »

Quantum Elements cuts quantum error rates using AI-powered digital twin

“That’s pretty clever, actually,” Sutor says. “It’s a little microwave pulse. That fixes some of the errors.” The Quantum Elements paper specifically addressed quantum error correction in IBM’s 127-qubit superconducting processor. But these techniques might also be able to be generalized to other types of quantum computers, Sutor says. And

Read More »

Energy Department Announces $500 Million to Strengthen Domestic Critical Materials Processing and Manufacturing

 Funding will expand domestic manufacturing of battery supply chains for defense, grid resilience, transportation, manufacturing and other industries WASHINGTON—The U.S. Department of Energy’s (DOE) Office of Critical Minerals and Energy Innovation (CMEI) today announced a Notice of Funding Opportunity (NOFO) for up to $500 million to expand U.S. critical mineral and materials processing and derivative battery manufacturing and recycling. Assistant Secretary of Energy (EERE) Audrey Robertson is currently in Japan meeting with regional allies at the Indo-Pacific Energy Security Ministerial and Business Forum (IPEM) to advance shared efforts on supply chain resilience and energy security issues. Her engagements at IPEM underscore the importance of close cooperation with partners as the United States strengthens its supply chain through this NOFO. “For too long, the United States has relied on hostile foreign actors to supply and process the critical materials that are essential in battery manufacturing and materials processing,” said U.S. Energy Secretary Chris Wright. “Thanks to President Trump’s leadership, the Department of Energy is playing a leading role in strengthening these domestic industries that will position the U.S. to win the AI race, meeting rising energy demand, and achieve energy dominance.” “I am delighted to be in Japan meeting with our allies, underscoring the important connection between critical materials and energy security,” said Assistant Secretary of Energy (EERE) Audrey Robertson. “Critical minerals processing is a vital component of our nation’s critical minerals supply base. Boosting domestic production, including through recycling, will bolster national security and ensure the United States and our partners are prepared to meet the energy challenges of the 21st century.” Funding awarded through this NOFO will support demonstration and/or commercial facilities for processing, recycling, or utilizing for manufacturing of critical materials which may include traditional battery minerals such as lithium, graphite, nickel, copper, aluminum, as well as other

Read More »

Energy Department Announces $293 Million in Funding to Support Genesis Mission National Science and Technology Challenges

WASHINGTON—The U.S. Department of Energy (DOE) today announced funding to advance the Genesis Mission’s efforts to tackle the nation’s most complex science and technology challenges. This includes a $293 million Request for Application (RFA),“The Genesis Mission: Transforming Science and Energy with AI.” Through this RFA, DOE invites interdisciplinary teams to leverage novel AI models and frameworks to address over 20 national challenges spanning advanced manufacturing, biotechnology, critical materials, nuclear energy, and quantum information science.    “The Genesis Mission has caught the imagination of our scientific and engineering communities to tackle national challenges in the age of AI,” said Under Secretary for Science Darío Gil and Genesis Mission Director. “With these investments we seek breakthrough ideas and novel collaborations leveraging the scientific prowess of our National Laboratories, the private sector, universities, and science philanthropies.”  The RFA is open to interdisciplinary teams from DOE National Laboratories, U.S. industry, and academia. Phase I awards will range from $500,000 to $750,000 and will support a nine month project period. Phase II awards will range from $6 million to $15 million over a three year project period. Teams may apply directly to either phase in FY 2026, and successful Phase I teams will be eligible to compete for larger Phase II awards in future cycles. Phase I applications and Phase II letters of intent are due April 28, 2026. Phase II applications are due May 19, 2026. DOE plans to hold an informational webinar about this RFA on March 26, 2026.  For full eligibility, application instructions, and challenge details, see the official NOFO: DE-FOA-0003612. Registration instructions and other details will be posted here.  ### 

Read More »

Trump Administration Keeps Coal Plant Open to Ensure Affordable, Reliable and Secure Power in the Northwest

Emergency order addresses critical grid reliability issues, lowering risk of blackouts and ensuring affordable electricity access. WASHINGTON—U.S. Secretary of Energy Chris Wright today issued an emergency order to ensure Americans in the Northwestern region of the United States have access to affordable, reliable and secure electricity. The order directs TransAlta to keep Unit 2 of the Centralia Generating Station in Centralia, Washington available to operate. Unit 2 of the coal plant was scheduled to shut down at the end of 2025. The reliable supply of power from the Centralia plant is essential to maintaining grid stability across the Northwest, and this order ensures that the region avoids unnecessary blackout risks and costs. “The last administration’s energy subtraction policies had the United States on track to likely experience significantly more blackouts in the coming years — thankfully, President Trump won’t let that happen,” said Energy Secretary Wright. “The Trump administration will continue taking action to keep America’s coal plants running so we can stop the price spikes and ensure we don’t lose critical generation sources. Americans deserve access to affordable, reliable, and secure energy to power their homes all the time, regardless of whether the wind is blowing or the sun is shining.” Thanks to President Trump’s leadership, coal plants across the country are reversing plans to shut down. On December 16, 2025, Secretary Wright issued an emergency order directing TransAlta to keep Unit 2 (729.9 MW) available to operate.According to DOE’s Resource Adequacy Report, blackouts were on track to potentially increase 100 times by 2030 if the U.S. continued to take reliable power offline as it did during the Biden administration. This order is in effect beginning on March 17, 2026, through June 14, 2026. ### 

Read More »

Brent retreats from highs after Trump signals Iran war nearing end

@import url(‘https://fonts.googleapis.com/css2?family=Inter:[email protected]&display=swap’); a { color: var(–color-primary-main); } .ebm-page__main h1, .ebm-page__main h2, .ebm-page__main h3, .ebm-page__main h4, .ebm-page__main h5, .ebm-page__main h6 { font-family: Inter; } body { line-height: 150%; letter-spacing: 0.025em; font-family: Inter; } button, .ebm-button-wrapper { font-family: Inter; } .label-style { text-transform: uppercase; color: var(–color-grey); font-weight: 600; font-size: 0.75rem; } .caption-style { font-size: 0.75rem; opacity: .6; } #onetrust-pc-sdk [id*=btn-handler], #onetrust-pc-sdk [class*=btn-handler] { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-policy a, #onetrust-pc-sdk a, #ot-pc-content a { color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-pc-sdk .ot-active-menu { border-color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-accept-btn-handler, #onetrust-banner-sdk #onetrust-reject-all-handler, #onetrust-consent-sdk #onetrust-pc-btn-handler.cookie-setting-link { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-consent-sdk .onetrust-pc-btn-handler { color: #c19a06 !important; border-color: #c19a06 !important; } Oil futures eased from recent highs Tuesday as markets reacted to comments from US President Donald Trump suggesting the war with Iran may be nearing its conclusion, easing concerns about prolonged disruptions to Middle East crude supplies. Brent crude had climbed above $100/bbl amid escalating tensions in the region and fears that the war could prolong disruptions to shipments through the Strait of Hormuz—one of the world’s most critical energy chokepoints and a transit route for roughly one-fifth of global oil supply. Prices pulled back after Pres. Trump said the war was “almost done,” prompting traders to reassess the risk premium that had built into crude markets during the latest escalation. The earlier gains were driven by the fact that the war had disrupted tanker traffic in the Strait of Hormuz, raising concerns about wider supply disruptions from major Gulf oil producers. While the latest remarks helped calm markets, analysts note that geopolitical risks remain elevated and price volatility is likely to persist as traders monitor developments in the region. Any renewed escalation could quickly send crude prices higher again.

Read More »

Southwest Arkansas lithium project moves toward FID with 10-year offtake deal

Smackover Lithium, a joint venture between Standard Lithium Ltd. and Equinor, through subsidiaries of Equinor ASA, signed the first commercial offtake agreement for the South West Arkansas Project (SWA Project) with commodities group Trafigura Trading LLC. Under the terms of a binding take-or-pay offtake agreement, the JV will supply Trafigura with 8,000 metric tonnes/year (tpy) of battery-quality lithium carbonate (Li2CO3) over a 10-year period, beginning at the start of commercial production. Smackover Lithium is expected to achieve final investment decision (FID) for the project, which aims to use direct lithium extraction technology to produce lithium from brine resources in the Smackover formation in southern Arkansas, in 2026, with first production anticipated in 2028. The project encompasses about 30,000 acres of brine leases in the region, with the initial phase of project development focused on production from the 20,854-acre Reynolds Brine Unit.   Front-end engineering design was completed in support of a definitive feasibility study with a principal recommendation that the project is ready to progress to FID.  While pricing terms of the Trafigura deal were kept confidential, Standard Lithium said they are “structured to support the anticipated financing for the project.” The JV is seeking to finalize customer offtake agreements for roughly 80% of the 22,500 tonnes of annual nameplate lithium carbonate capacity for the initial phase of the project. This agreement represents over 40% of the targeted offtake commitments. Formed in 2024, Smackover Lithium is developing multiple DLE projects in Southwest Arkansas and East Texas. Standard Lithium is operator of the projecs with 55% interest. Equinor holds the remaining 45% interest.

Read More »

Equinor makes oil and gas discoveries in the North Sea

Equinor Energy AS discovered oil in the Troll area and gas and condensate in the Sleipner area of the North Sea. Byrding C discovery well 35/11-32 S in production license (PL) 090 HS was made 5 km northwest of Fram field in Troll. The well was drilled by the COSL Innovator rig in 373 m of water to 3,517 m TVD subsea. It was terminated in the Heather formation from the Middle Jurassic. The primary exploration target was to prove petroleum in reservoir rocks from the Late Jurassic deep marine equivalent to the Sognefjord formation. The secondary target was to prove petroleum and investigate the presence of potential reservoir rocks in two prospective intervals from the Middle Jurassic in deep marine equivalents to the Fensfjord formation. The well encountered a 22-m oil column in sandstone layers in the Sognefjord formation with a total thickness of 82 m, of which 70 m was sandstone with moderate to good reservoir properties. The oil-water contact was encountered. The secondary exploration target in the Fensfjord formation did not prove reservoir rocks or hydrocarbons. The well was not formation-tested, but data and samples were collected. The well has been permanently plugged. Preliminary estimates indicate the size of the discovery is 4.4–8.2 MMboe. Oil discovered in Byrding C will be produced using existing or future infrastructure in the area. The Frida Kahlo discovery was drilled from the Sleipner B platform in production license PL 046 northwest of Sleipner Vest and is estimated to contain 5–9 MMboe of gas and condensate. The well will be brought on stream as early as April. The four most recent exploration wells in the Sleipner area, drilled over a 3-month period, include Lofn, Langemann, Sissel, and Frida Kahlo. All have all proven gas and condensate in the Hugin formation, with combined estimated

Read More »

System-level ‘coopetition’: Why Nvidia’s DGX Rubin NVL8 runs on Intel Xeon 6

Not a strategic alliance Despite working together at the system level, the relationship between the two companies does not amount to a formal strategic alliance. “The Intel–Nvidia dynamic is best understood as system-level coopetition. Long-standing collaboration persists across data center and PC ecosystems, with Intel CPUs paired alongside Nvidia GPUs forming standardized AI server architectures and enabling deeper integration,” said Manish Rawat, semiconductor analyst at TechInsights. However, competition is accelerating structurally. Even though Nvidia dominates the GPU space, the company is also expanding its presence across more layers of the data-center stack. It has been developing its own CPUs, such as the Grace CPU, aimed at tighter integration between compute, memory, and interconnect. The company has also launched Vera CPU, purpose-built for agentic AI at GTC 2026. This reflects Nvidia’s broader approach of building more of the system in-house, spanning both hardware and software, even as it continues to incorporate external components where required. “Nvidia’s push into CPUs (Grace, Vera) and tightly integrated, NVLink-based systems signals a shift toward full-stack ownership spanning compute, networking, and software. This challenges Intel’s traditional dominance in CPUs and system control. In essence, Nvidia is partnering tactically to sustain ecosystem adoption while strategically positioning to displace incumbents and capture greater control of next-generation AI infrastructure,” added Rawat.

Read More »

Nvidia announces Vera Rubin platform, signaling a shift to full-stack AI infrastructure

The transition reflects a deeper move from optimizing individual components to engineering entire systems for scalability and efficiency, said Sanchit Vir Gogia, chief analyst at Greyhound Research. “Compute, memory behavior, interconnect bandwidth, and workload orchestration are being engineered together,” Gogia said. “Even physical design choices such as rack modularity, serviceability, and assembly efficiency are now part of performance engineering. Infrastructure is beginning to resemble an appliance at scale, but one that operates at extreme density and complexity.” Industry observers said rack-scale systems, including Nvidia’s NVL72 and open standards such as OCP Open Rack, are enabling more flexible pooling and orchestration of infrastructure resources for AI and machine learning workloads. “I am also seeing other operators are increasingly adopting chip-to-grid strategies, integrating onsite power generation (microgrids, batteries), advanced cooling technologies, and co-packaged optics to effectively manage power spikes, reduce conversion losses, and support rack densities exceeding 100kW,” said Franco Chiam, VP of Cloud, Datacenter, Telecommunication, and Infrastructure Research Group at IDC Asia Pacific. “This collective industry response to adapt to the needs for higher power and thermal demands is further reinforced by leading vendors and hyperscalers aligning around open standards, facilitating scalable, gigawatt-class datacenter deployments,” Chiam added. Networking takes center stage Networking is emerging as a central component of AI infrastructure, as platforms such as Vera Rubin place greater emphasis on how data moves across systems rather than treating connectivity as a supporting layer.

Read More »

Available’s $5B Project Qestrel aims to roll out 1,000 AI-ready edge data centers by year’s end

Available is partnering with wireless infrastructure company Crown Castle, which owns, operates, and leases more than 40,000 cell towers and roughly 90,000 miles of fiber. “Our strategy is to industrialize and modularize deployment by building on telecom co-location and pre-existing physical infrastructure rather than greenfield hyperscale construction,” said Medina. Some initial sites are live (the company declined to say how many, due to “final contractual and commissioning milestones”) and 30 cities are expected to come online by early July. Available is prioritizing dense urban corridors, and early adoption has begun in “major Northeast corridors with a path to nationwide rollout,” Medina explained. The company’s infrastructure will be used by Strata Expanse, which specializes in 60 to 90 day AI data center deployments, and incorporated into Strata’s new full-stack, end-to-end Amphix AI Infrastructure Platform. The neocloud architecture will run up to 48 GPUs per site, bringing AI inferencing to the edge. Many sites will be pre-integrated with IBM’s watsonx; others will be AI-agnostic, allowing enterprises to run their preferred models. According to Available, Project Qestrel will provide:

Read More »

Cisco extends its Secure AI Factory with Nvidia

“Customers can now control and manage this environment and operate it like it was a traditional data center fabric,” Wollenweber said. “The ability to bring it under the same Nexus umbrella is actually a huge selling point for AI customers, because their IT infrastructure folks, their operational people that are running the network, already understand how to use these Nexus tools, and so they can now add AI workloads and kind of accelerated computing technologies like GPUs, but in that same Nexus umbrella,” Wollenweber said.  “As Al becomes operational and distributed, complexity becomes the enemy of scale. Fragmented architectures force customers to manage integration, policy enforcement, observability, and security across silos, increasing cost and slowing innovation,” said Wollenweber. “Architecting silicon, networking, compute, security, and Al software into a cohesive system gives organizations a unified operating model, stronger performance guarantees, and embedded trust.” Those are the driving ideas around Cisco Secure AI Factory with Nvidia, Wollenweber said. Introduced a year ago, Secure AI Factory with Nvidia integrates Cisco’s Hypershield and AI Defense packages to help protect the development, deployment, and use of AI models and applications. Hypershield uses AI to dynamically refine security policies based on application identity and behavior. It automates policy creation, optimization, and enforcement across workloads. AI Defense discovers the various models being used in a customer’s AI development and uses four features to help customers enforce AI protection: AI access, AI cloud visibility, AI model and application validation, and AI runtime protection. Cisco integrates Hybrid Mesh Firewall technology On the security side, Cisco said it will embed its Hybrid Mesh Firewall technology to allow for security policy enforcement on Nvidia BlueField data processing units (DPU) that are embedded in Nvidia GPU servers connected to Cisco Nexus One fabrics. Cisco Hybrid Mesh Firewall offers a distributed security fabric

Read More »

Middle East war fosters concerns about physical data center security

The most common issue that Guidepost talks about with its clients is insider threats, which can be anyone that is rightfully permitted into your data center. Data centers have very strict rules regarding movement of visitors, but employees pretty much have free rule of the place. “Insider threat could be someone simply putting a USB stick in a server or having access to a data device that they’re not supposed to,” he said. “A threat actor could potentially cause harm within the facility, whether that’s mechanical, electrical, plumbing spaces or the data halls themselves is our number one preventative item that we’re trying to thwart.” When it comes to external threats, Guidepost looks after vehicle-borne IEDs and vehicle ramming, even if it’s accidental. That’s why data centers have high, anti-climb perimeter fences, multi-layered gates. and vehicle barriers that are put in place help to prevent any unwanted vehicles outside of the facility. “It’s a lot of what we call Crime Prevention Through Environmental Design,” said Bekisz. “It’s a theory that we utilize in our industry for ensuring that we are detecting and thwarting individuals before they are willing to commit some type of offensive action or some type of unwanted behavior.” That includes simple things like lighting right or reducing the visibility of the data center through shrubs and trees and berms and using that in consortium with physical preventative devices. Drones are a growing problem, even if they are not being used in kamikaze attacks. Bekisz said the only thing you can do is put in drone detection, so you have some type of device in the air in the area of your facility, and then you call for support from local emergency services.

Read More »

Palantir partners with Nvidia to streamline AI data center deployment

This collaboration grants enterprises full control over their data, AI models, and applications while supporting the use of open-source AI models and related data acceleration tools. The Palantir AI OS reference architecture gives enterprises total control over their data, AI models and applications. It is particularly critical for customers with existing GPU infrastructure, latency-sensitive workflows, data sovereignty requirements, and high geographic distribution. “From our first deployment with the United States government and in every deployment since, our software has had to meet the moment in the most complex and sensitive environments where customers must maintain control,” says Akshay Krishnaswamy, Palantir’s chief architect in a statement. “Together with Nvidia — and building on many customers’ existing investments — we are proud to deliver a fully integrated AI operating system that is optimized for Nvidia accelerated compute infrastructure and enables customers to realize the promise of on-premises, edge, and sovereign cloud deployments,” he added. Sovereign AI is an emerging market that represents a country’s efforts to develop and maintain control of its own AI, using its own data, and keeping the data within its borders.

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »