Stay Ahead, Stay ONMINE

NVIDIA’s Rubin Redefines the AI Factory

The Architecture Shift: From “GPU Server” to “Rack-Scale Supercomputer” NVIDIA’s Rubin architecture is built around a single design thesis: “extreme co-design.” In practice, that means GPUs, CPUs, networking, security, software, power delivery, and cooling are architected together; treating the data center as the compute unit, not the individual server. That logic shows up most clearly […]

The Architecture Shift: From “GPU Server” to “Rack-Scale Supercomputer”

NVIDIA’s Rubin architecture is built around a single design thesis: “extreme co-design.” In practice, that means GPUs, CPUs, networking, security, software, power delivery, and cooling are architected together; treating the data center as the compute unit, not the individual server.

That logic shows up most clearly in the NVL72 system. NVLink 6 serves as the scale-up spine, designed to let 72 GPUs communicate all-to-all with predictable latency, something NVIDIA argues is essential for mixture-of-experts routing and synchronization-heavy inference paths.

NVIDIA is not vague about what this requires. Its technical materials describe the Rubin GPU as delivering 50 PFLOPS of NVFP4 inference and 35 PFLOPS of NVFP4 training, with 22 TB/s of HBM4 bandwidth and 3.6 TB/s of NVLink bandwidth per GPU.

The point of that bandwidth is not headline-chasing. It is to prevent a rack from behaving like 72 loosely connected accelerators that stall on communication. NVIDIA wants the rack to function as a single engine because that is what it will take to drive down cost per token at scale.

The New Idea NVIDIA Is Elevating: Inference Context Memory as Infrastructure

If there is one genuinely new concept in the Rubin announcements, it is the elevation of context memory, and the admission that GPU memory alone will not carry the next wave of inference.

NVIDIA describes a new tier called NVIDIA Inference Context Memory Storage, powered by BlueField-4, designed to persist and share inference state (such as KV caches) across requests and nodes for long-context and agentic workloads. NVIDIA says this AI-native context tier can boost tokens per second by up to 5× and improve power efficiency by up to 5× compared with traditional storage approaches.

The implication is clear: the path to cheaper inference is not just faster GPUs. It is treating inference state as a reusable asset; one that can live beyond a single GPU’s memory window, be accessed quickly, and eliminate redundant computation.

If the Blackwell generation was about feeding the transformer, Rubin adds the ability to remember what has already been fed, and to use that memory to make everything run faster.

Security and Reliability Move From Checkboxes to Rack-Scale Requirements

Two themes run consistently through the Rubin announcements: confidential computing and resilience.

NVIDIA says Rubin introduces third-generation confidential computing, positioning Vera Rubin NVL72 as the first rack-scale platform to deliver NVIDIA Confidential Computing across CPU, GPU, and NVLink domains.

It also highlights a second-generation RAS engine spanning GPU, CPU, and NVLink, with real-time health monitoring, fault tolerance, and modular, “cable-free” trays designed to speed servicing.

This is the AI factory idea expressed in operational terms. NVIDIA is treating uptime and serviceability as performance features. Because in a world of continuous inference, every minute of downtime is a direct tax on token economics.

Networking Gets a Starring Role: Co-Packaged Optics and AI-Native Ethernet

Rubin treats networking not as a support layer, but as a primary design constraint.

On the Ethernet side, NVIDIA is pushing Spectrum-X Ethernet Photonics and co-packaged optics, claiming up to 5× improvements in power efficiency along with gains in reliability and uptime for AI workloads.

In its technical materials, NVIDIA describes Spectrum-6 as a 102.4 Tb/s switch chip engineered for the synchronized, bursty traffic patterns common in large AI clusters, and shows architectures that scale into the hundreds of terabits per second using co-packaged optics.

What is notable is not just what NVIDIA is promoting, but what it is no longer centering. InfiniBand, long the default fabric for high-performance AI and HPC clusters, still appears in the Rubin ecosystem, but it no longer dominates the narrative. Ethernet, long treated as the “good enough” alternative, is being recast as an AI-native fabric when paired with photonics, co-packaged optics, and tight system co-design.

The logic is simple. At scale, the network becomes either the accelerator or the brake. Rubin is NVIDIA’s attempt to remove the brake by making Ethernet behave like an AI-native fabric, rather than a generic data network.

DGX SuperPOD as the “Production Blueprint,” Not a Reference Design

NVIDIA is positioning DGX SuperPOD as the standard blueprint for scaling Rubin, as opposed to a conceptual reference design.

In its Rubin materials, the SuperPOD is defined by the integration of NVL72 or NVL8 systems, BlueField-4 DPUs, the Inference Context Memory Storage platform, ConnectX-9 networking, Quantum-X800 InfiniBand or Spectrum-X Ethernet fabrics, and Mission Control operations software.

NVIDIA also anchors that blueprint with a concrete scale marker. A SuperPOD configuration built from eight DGX Vera Rubin NVL72 systems (576 Rubin GPUs in total) is described as delivering 28.8 exaflops of FP4 performance and 600 TB of fast memory.

This is not marketing theater. It is NVIDIA spelling out what a “minimum viable AI factory module” looks like in procurement terms: standardized racks, pods, and operating tooling, rather than one-off integration projects.

How Rubin Stacks Up: What Changes Versus Blackwell

NVIDIA’s own comparisons return to three core differences, all centered on cost and utilization:

  • Economics lead the message: Up to 10× lower inference token cost and 4× fewer GPUs required for mixture-of-experts training versus Blackwell.

  • Rack-scale coherence: NVLink 6 and NVL72 are positioned as the way to keep utilization high as model parallelism, expert routing, and long-context inference become more complex.

  • Infrastructure tiers for inference: The new context memory storage tier is an explicit acknowledgment that next-generation inference is about state management and reuse, not just raw compute.

If Blackwell marked the peak of the “bigger GPU, bigger box” era, Rubin is NVIDIA’s attempt to make the rack the computer, and to make operations part of the architecture itself.

What Rubin Signals for AI Compute in 2026 and Beyond

The center of gravity in AI is shifting from a race to train the biggest models to the industrialization of inference.

Rubin aligns NVIDIA’s roadmap with a world building products, agents, and services that run continuously, not occasional demos. Cost per token becomes the metric CFOs understand, and NVIDIA is positioning itself to own that number the way it once owned training performance.

Data centers will increasingly be designed around AI racks, not the other way around.

Between liquid-cooling assumptions, serviceability as a design requirement, and photonics-heavy networking, Rubin points toward a more specialized facility baseline: one that looks less like a generic data hall and more like a purpose-built factory floor for tokens. The AI factory will not be a slogan; it will define the generation.

The platform story also tightens NVIDIA’s grip, because the system itself becomes the differentiator. With six interdependent chips, rack-scale security, AI-native context storage, and Mission Control operations, Rubin is intentionally hard to unbundle. Even if competitors match a GPU on raw math, NVIDIA is betting they will not match a fully integrated AI factory on cost or performance.

And once inference state is treated as shared infrastructure, storage and networking decisions change. Economic value shifts toward systems that can reuse context efficiently, keeping GPUs busy without re-computing the world every session.

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Lenovo bolsters hybrid AI platform with Nvidia GPUs

Lenovo and Nvidia meld Lenovo’s AI inferencing platforms with Nvidia Dynamo and NIM,  as well as Nvidia’s Vera Rubin NVL72 for Lenovo AI Cloud gigafactory. Citing the CIO Playbook 2026 — commissioned by Lenovo and conducted by IDC –, 84% of organizations expect to run AI across on-premises or edge

Read More »

Fortinet’s AI-driven defense for a machine-speed era

While the customer panels and WEF sessions were separate events, they aligned with my takeaways from WEF’s Davos 2026 event. AI has the power to change the world, but threat actors are using it now to find new ways of breaching organizations. Organizations are looking to simplify their cybersecurity by

Read More »

Energy Department Announces $500 Million to Strengthen Domestic Critical Materials Processing and Manufacturing

 Funding will expand domestic manufacturing of battery supply chains for defense, grid resilience, transportation, manufacturing and other industries WASHINGTON—The U.S. Department of Energy’s (DOE) Office of Critical Minerals and Energy Innovation (CMEI) today announced a Notice of Funding Opportunity (NOFO) for up to $500 million to expand U.S. critical mineral and materials processing and derivative battery manufacturing and recycling. Assistant Secretary of Energy (EERE) Audrey Robertson is currently in Japan meeting with regional allies at the Indo-Pacific Energy Security Ministerial and Business Forum (IPEM) to advance shared efforts on supply chain resilience and energy security issues. Her engagements at IPEM underscore the importance of close cooperation with partners as the United States strengthens its supply chain through this NOFO. “For too long, the United States has relied on hostile foreign actors to supply and process the critical materials that are essential in battery manufacturing and materials processing,” said U.S. Energy Secretary Chris Wright. “Thanks to President Trump’s leadership, the Department of Energy is playing a leading role in strengthening these domestic industries that will position the U.S. to win the AI race, meeting rising energy demand, and achieve energy dominance.” “I am delighted to be in Japan meeting with our allies, underscoring the important connection between critical materials and energy security,” said Assistant Secretary of Energy (EERE) Audrey Robertson. “Critical minerals processing is a vital component of our nation’s critical minerals supply base. Boosting domestic production, including through recycling, will bolster national security and ensure the United States and our partners are prepared to meet the energy challenges of the 21st century.” Funding awarded through this NOFO will support demonstration and/or commercial facilities for processing, recycling, or utilizing for manufacturing of critical materials which may include traditional battery minerals such as lithium, graphite, nickel, copper, aluminum, as well as other

Read More »

Energy Department Announces $293 Million in Funding to Support Genesis Mission National Science and Technology Challenges

WASHINGTON—The U.S. Department of Energy (DOE) today announced funding to advance the Genesis Mission’s efforts to tackle the nation’s most complex science and technology challenges. This includes a $293 million Request for Application (RFA),“The Genesis Mission: Transforming Science and Energy with AI.” Through this RFA, DOE invites interdisciplinary teams to leverage novel AI models and frameworks to address over 20 national challenges spanning advanced manufacturing, biotechnology, critical materials, nuclear energy, and quantum information science.    “The Genesis Mission has caught the imagination of our scientific and engineering communities to tackle national challenges in the age of AI,” said Under Secretary for Science Darío Gil and Genesis Mission Director. “With these investments we seek breakthrough ideas and novel collaborations leveraging the scientific prowess of our National Laboratories, the private sector, universities, and science philanthropies.”  The RFA is open to interdisciplinary teams from DOE National Laboratories, U.S. industry, and academia. Phase I awards will range from $500,000 to $750,000 and will support a nine month project period. Phase II awards will range from $6 million to $15 million over a three year project period. Teams may apply directly to either phase in FY 2026, and successful Phase I teams will be eligible to compete for larger Phase II awards in future cycles. Phase I applications and Phase II letters of intent are due April 28, 2026. Phase II applications are due May 19, 2026. DOE plans to hold an informational webinar about this RFA on March 26, 2026.  For full eligibility, application instructions, and challenge details, see the official NOFO: DE-FOA-0003612. Registration instructions and other details will be posted here.  ### 

Read More »

Trump Administration Keeps Coal Plant Open to Ensure Affordable, Reliable and Secure Power in the Northwest

Emergency order addresses critical grid reliability issues, lowering risk of blackouts and ensuring affordable electricity access. WASHINGTON—U.S. Secretary of Energy Chris Wright today issued an emergency order to ensure Americans in the Northwestern region of the United States have access to affordable, reliable and secure electricity. The order directs TransAlta to keep Unit 2 of the Centralia Generating Station in Centralia, Washington available to operate. Unit 2 of the coal plant was scheduled to shut down at the end of 2025. The reliable supply of power from the Centralia plant is essential to maintaining grid stability across the Northwest, and this order ensures that the region avoids unnecessary blackout risks and costs. “The last administration’s energy subtraction policies had the United States on track to likely experience significantly more blackouts in the coming years — thankfully, President Trump won’t let that happen,” said Energy Secretary Wright. “The Trump administration will continue taking action to keep America’s coal plants running so we can stop the price spikes and ensure we don’t lose critical generation sources. Americans deserve access to affordable, reliable, and secure energy to power their homes all the time, regardless of whether the wind is blowing or the sun is shining.” Thanks to President Trump’s leadership, coal plants across the country are reversing plans to shut down. On December 16, 2025, Secretary Wright issued an emergency order directing TransAlta to keep Unit 2 (729.9 MW) available to operate.According to DOE’s Resource Adequacy Report, blackouts were on track to potentially increase 100 times by 2030 if the U.S. continued to take reliable power offline as it did during the Biden administration. This order is in effect beginning on March 17, 2026, through June 14, 2026. ### 

Read More »

Brent retreats from highs after Trump signals Iran war nearing end

@import url(‘https://fonts.googleapis.com/css2?family=Inter:[email protected]&display=swap’); a { color: var(–color-primary-main); } .ebm-page__main h1, .ebm-page__main h2, .ebm-page__main h3, .ebm-page__main h4, .ebm-page__main h5, .ebm-page__main h6 { font-family: Inter; } body { line-height: 150%; letter-spacing: 0.025em; font-family: Inter; } button, .ebm-button-wrapper { font-family: Inter; } .label-style { text-transform: uppercase; color: var(–color-grey); font-weight: 600; font-size: 0.75rem; } .caption-style { font-size: 0.75rem; opacity: .6; } #onetrust-pc-sdk [id*=btn-handler], #onetrust-pc-sdk [class*=btn-handler] { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-policy a, #onetrust-pc-sdk a, #ot-pc-content a { color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-pc-sdk .ot-active-menu { border-color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-accept-btn-handler, #onetrust-banner-sdk #onetrust-reject-all-handler, #onetrust-consent-sdk #onetrust-pc-btn-handler.cookie-setting-link { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-consent-sdk .onetrust-pc-btn-handler { color: #c19a06 !important; border-color: #c19a06 !important; } Oil futures eased from recent highs Tuesday as markets reacted to comments from US President Donald Trump suggesting the war with Iran may be nearing its conclusion, easing concerns about prolonged disruptions to Middle East crude supplies. Brent crude had climbed above $100/bbl amid escalating tensions in the region and fears that the war could prolong disruptions to shipments through the Strait of Hormuz—one of the world’s most critical energy chokepoints and a transit route for roughly one-fifth of global oil supply. Prices pulled back after Pres. Trump said the war was “almost done,” prompting traders to reassess the risk premium that had built into crude markets during the latest escalation. The earlier gains were driven by the fact that the war had disrupted tanker traffic in the Strait of Hormuz, raising concerns about wider supply disruptions from major Gulf oil producers. While the latest remarks helped calm markets, analysts note that geopolitical risks remain elevated and price volatility is likely to persist as traders monitor developments in the region. Any renewed escalation could quickly send crude prices higher again.

Read More »

Southwest Arkansas lithium project moves toward FID with 10-year offtake deal

Smackover Lithium, a joint venture between Standard Lithium Ltd. and Equinor, through subsidiaries of Equinor ASA, signed the first commercial offtake agreement for the South West Arkansas Project (SWA Project) with commodities group Trafigura Trading LLC. Under the terms of a binding take-or-pay offtake agreement, the JV will supply Trafigura with 8,000 metric tonnes/year (tpy) of battery-quality lithium carbonate (Li2CO3) over a 10-year period, beginning at the start of commercial production. Smackover Lithium is expected to achieve final investment decision (FID) for the project, which aims to use direct lithium extraction technology to produce lithium from brine resources in the Smackover formation in southern Arkansas, in 2026, with first production anticipated in 2028. The project encompasses about 30,000 acres of brine leases in the region, with the initial phase of project development focused on production from the 20,854-acre Reynolds Brine Unit.   Front-end engineering design was completed in support of a definitive feasibility study with a principal recommendation that the project is ready to progress to FID.  While pricing terms of the Trafigura deal were kept confidential, Standard Lithium said they are “structured to support the anticipated financing for the project.” The JV is seeking to finalize customer offtake agreements for roughly 80% of the 22,500 tonnes of annual nameplate lithium carbonate capacity for the initial phase of the project. This agreement represents over 40% of the targeted offtake commitments. Formed in 2024, Smackover Lithium is developing multiple DLE projects in Southwest Arkansas and East Texas. Standard Lithium is operator of the projecs with 55% interest. Equinor holds the remaining 45% interest.

Read More »

Equinor makes oil and gas discoveries in the North Sea

Equinor Energy AS discovered oil in the Troll area and gas and condensate in the Sleipner area of the North Sea. Byrding C discovery well 35/11-32 S in production license (PL) 090 HS was made 5 km northwest of Fram field in Troll. The well was drilled by the COSL Innovator rig in 373 m of water to 3,517 m TVD subsea. It was terminated in the Heather formation from the Middle Jurassic. The primary exploration target was to prove petroleum in reservoir rocks from the Late Jurassic deep marine equivalent to the Sognefjord formation. The secondary target was to prove petroleum and investigate the presence of potential reservoir rocks in two prospective intervals from the Middle Jurassic in deep marine equivalents to the Fensfjord formation. The well encountered a 22-m oil column in sandstone layers in the Sognefjord formation with a total thickness of 82 m, of which 70 m was sandstone with moderate to good reservoir properties. The oil-water contact was encountered. The secondary exploration target in the Fensfjord formation did not prove reservoir rocks or hydrocarbons. The well was not formation-tested, but data and samples were collected. The well has been permanently plugged. Preliminary estimates indicate the size of the discovery is 4.4–8.2 MMboe. Oil discovered in Byrding C will be produced using existing or future infrastructure in the area. The Frida Kahlo discovery was drilled from the Sleipner B platform in production license PL 046 northwest of Sleipner Vest and is estimated to contain 5–9 MMboe of gas and condensate. The well will be brought on stream as early as April. The four most recent exploration wells in the Sleipner area, drilled over a 3-month period, include Lofn, Langemann, Sissel, and Frida Kahlo. All have all proven gas and condensate in the Hugin formation, with combined estimated

Read More »

Microsoft’s laser-free cable tech promises to slash AI data center power bills in half

The power problem, Microsoft argues, starts with the cables themselves. How MOSAIC works Copper interconnects top out at roughly two meters at high data rates, limiting them to within a single rack. Laser-based fiber optic cables go further but consume more power and are sensitive to temperature and dust, Microsoft said in the post. MOSAIC reaches up to 50 meters while drawing less power than either, the company added. “Imaging fiber looks like a standard fiber, but inside it has thousands of cores,” Paolo Costa, a Microsoft partner research manager and the project’s lead researcher, wrote in the post. “That was the missing piece. We finally had a way to carry thousands of parallel channels in one cable.” MOSAIC is not Microsoft’s only optical networking bet, and it is not the one furthest along. HCF is already in production across Azure regions MOSAIC arrives alongside Hollow Core Fiber (HCF), a complementary technology Microsoft is already deploying globally. HCF carries optical signals through air rather than glass, delivering up to 47% faster data transmission and 33% lower latency than conventional single-mode fiber, according to published research from the University of Southampton cited by Microsoft. Frank Rey, Microsoft’s general manager of Azure Hyperscale Networking, said in the post that the two technologies are complementary — HCF for long-distance inter-datacenter links, MOSAIC for in-facility GPU and server connectivity.

Read More »

Beyond the fan: Crossing the liquid cooling rubicon

At 20 kW per rack, the airflow velocity required to maintain safe operating temperatures triggers two failure modes. First, the acoustic vibration becomes severe enough to damage equipment. Organizations learn this lesson the hard way — high-frequency vibration from upgraded CRAC units causing bit errors in high-density Non-Volatile Memory Express (NVMe) storage arrays. The signature is mechanical resonance in drive enclosures. Fans shake storage infrastructure to death. Second, the power required for that airflow becomes self-defeating. At 100 kW densities, nearly 30 percent of the total facility power goes to fans alone — before accounting for compressors and chillers working overtime to cool the air. According to Uptime Institute research, data centers spend an estimated $1.9 to $2.8 million per MW annually on operations, with cooling-related costs consuming nearly $500,000 of that figure. The American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) TC 9.9 guidelines governing data center thermal management were written for a 15 kW world. Many organizations now operate so far outside those parameters that the guidelines have become irrelevant. One moment crystallized this reality. A single CRAC unit failed in a training cluster. Within eight minutes, hot-aisle temperatures exceeded 120°F. Monitoring systems triggered automatic throttling on millions of dollars of compute infrastructure. A multi-day processing run crashed and restarted from a checkpoint. Standing in that sweltering aisle watching temperature readouts climb, the conclusion was inescapable: air had carried the industry as far as it could go. Crossing the Rubicon: Cold plates versus rear-door heat exchangers Bringing liquid into a data center is terrifying. Water — or water-adjacent fluids — enters rooms filled with equipment worth tens of millions of dollars. Equipment that fails catastrophically when wet. “Crossing the Rubicon” captures the commitment: once started down this path, there is no returning to the comfortable certainty of

Read More »

System-level ‘coopetition’: Why Nvidia’s DGX Rubin NVL8 runs on Intel Xeon 6

Not a strategic alliance Despite working together at the system level, the relationship between the two companies does not amount to a formal strategic alliance. “The Intel–Nvidia dynamic is best understood as system-level coopetition. Long-standing collaboration persists across data center and PC ecosystems, with Intel CPUs paired alongside Nvidia GPUs forming standardized AI server architectures and enabling deeper integration,” said Manish Rawat, semiconductor analyst at TechInsights. However, competition is accelerating structurally. Even though Nvidia dominates the GPU space, the company is also expanding its presence across more layers of the data-center stack. It has been developing its own CPUs, such as the Grace CPU, aimed at tighter integration between compute, memory, and interconnect. The company has also launched Vera CPU, purpose-built for agentic AI at GTC 2026. This reflects Nvidia’s broader approach of building more of the system in-house, spanning both hardware and software, even as it continues to incorporate external components where required. “Nvidia’s push into CPUs (Grace, Vera) and tightly integrated, NVLink-based systems signals a shift toward full-stack ownership spanning compute, networking, and software. This challenges Intel’s traditional dominance in CPUs and system control. In essence, Nvidia is partnering tactically to sustain ecosystem adoption while strategically positioning to displace incumbents and capture greater control of next-generation AI infrastructure,” added Rawat.

Read More »

Nvidia announces Vera Rubin platform, signaling a shift to full-stack AI infrastructure

The transition reflects a deeper move from optimizing individual components to engineering entire systems for scalability and efficiency, said Sanchit Vir Gogia, chief analyst at Greyhound Research. “Compute, memory behavior, interconnect bandwidth, and workload orchestration are being engineered together,” Gogia said. “Even physical design choices such as rack modularity, serviceability, and assembly efficiency are now part of performance engineering. Infrastructure is beginning to resemble an appliance at scale, but one that operates at extreme density and complexity.” Industry observers said rack-scale systems, including Nvidia’s NVL72 and open standards such as OCP Open Rack, are enabling more flexible pooling and orchestration of infrastructure resources for AI and machine learning workloads. “I am also seeing other operators are increasingly adopting chip-to-grid strategies, integrating onsite power generation (microgrids, batteries), advanced cooling technologies, and co-packaged optics to effectively manage power spikes, reduce conversion losses, and support rack densities exceeding 100kW,” said Franco Chiam, VP of Cloud, Datacenter, Telecommunication, and Infrastructure Research Group at IDC Asia Pacific. “This collective industry response to adapt to the needs for higher power and thermal demands is further reinforced by leading vendors and hyperscalers aligning around open standards, facilitating scalable, gigawatt-class datacenter deployments,” Chiam added. Networking takes center stage Networking is emerging as a central component of AI infrastructure, as platforms such as Vera Rubin place greater emphasis on how data moves across systems rather than treating connectivity as a supporting layer.

Read More »

Available’s $5B Project Qestrel aims to roll out 1,000 AI-ready edge data centers by year’s end

Available is partnering with wireless infrastructure company Crown Castle, which owns, operates, and leases more than 40,000 cell towers and roughly 90,000 miles of fiber. “Our strategy is to industrialize and modularize deployment by building on telecom co-location and pre-existing physical infrastructure rather than greenfield hyperscale construction,” said Medina. Some initial sites are live (the company declined to say how many, due to “final contractual and commissioning milestones”) and 30 cities are expected to come online by early July. Available is prioritizing dense urban corridors, and early adoption has begun in “major Northeast corridors with a path to nationwide rollout,” Medina explained. The company’s infrastructure will be used by Strata Expanse, which specializes in 60 to 90 day AI data center deployments, and incorporated into Strata’s new full-stack, end-to-end Amphix AI Infrastructure Platform. The neocloud architecture will run up to 48 GPUs per site, bringing AI inferencing to the edge. Many sites will be pre-integrated with IBM’s watsonx; others will be AI-agnostic, allowing enterprises to run their preferred models. According to Available, Project Qestrel will provide:

Read More »

Cisco extends its Secure AI Factory with Nvidia

“Customers can now control and manage this environment and operate it like it was a traditional data center fabric,” Wollenweber said. “The ability to bring it under the same Nexus umbrella is actually a huge selling point for AI customers, because their IT infrastructure folks, their operational people that are running the network, already understand how to use these Nexus tools, and so they can now add AI workloads and kind of accelerated computing technologies like GPUs, but in that same Nexus umbrella,” Wollenweber said.  “As Al becomes operational and distributed, complexity becomes the enemy of scale. Fragmented architectures force customers to manage integration, policy enforcement, observability, and security across silos, increasing cost and slowing innovation,” said Wollenweber. “Architecting silicon, networking, compute, security, and Al software into a cohesive system gives organizations a unified operating model, stronger performance guarantees, and embedded trust.” Those are the driving ideas around Cisco Secure AI Factory with Nvidia, Wollenweber said. Introduced a year ago, Secure AI Factory with Nvidia integrates Cisco’s Hypershield and AI Defense packages to help protect the development, deployment, and use of AI models and applications. Hypershield uses AI to dynamically refine security policies based on application identity and behavior. It automates policy creation, optimization, and enforcement across workloads. AI Defense discovers the various models being used in a customer’s AI development and uses four features to help customers enforce AI protection: AI access, AI cloud visibility, AI model and application validation, and AI runtime protection. Cisco integrates Hybrid Mesh Firewall technology On the security side, Cisco said it will embed its Hybrid Mesh Firewall technology to allow for security policy enforcement on Nvidia BlueField data processing units (DPU) that are embedded in Nvidia GPU servers connected to Cisco Nexus One fabrics. Cisco Hybrid Mesh Firewall offers a distributed security fabric

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »