Stay Ahead, Stay ONMINE

Ethically trained AI startup Pleias releases new small reasoning models optimized for RAG with built-in citations

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More French AI startup Pleias made waves late last year with the launch of its ethically trained Pleias 1.0 family of small language models — among the first and only to date to be built entirely on […]

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


French AI startup Pleias made waves late last year with the launch of its ethically trained Pleias 1.0 family of small language models — among the first and only to date to be built entirely on scraping “open” data, that is, data explicitly labeled as public domain, open source, or unlicensed and not copyrighted.

Now the company has announced the release of two open source small-scale reasoning models designed specifically for retrieval-augmented generation (RAG), citation synthesis, and structured multilingual output.

The launch includes two core models — Pleias-RAG-350M and Pleias-RAG-1B — each also available in CPU-optimized GGUF format, making a total of four deployment-ready variants.

They are all based on Pleias 1.0, and can be used independently or in conjunction with other LLMs that the organization may already or plan to deploy. All appear to be available under a permissive Apache 2.0 open source license, meaning they are eligible for organizations to take, modify and deploy for commercial use cases.

RAG, as you’ll recall, is the widely-used technique that enterprises and organizations can deploy to hook an AI large language model (LLM) such as OpenAI’s GPT-4o, Google’s Gemini 2.5 Flash, Anthropic’s Claude Sonnet 3.7 or Cohere’s Command-A, or open source alternatives like Llama 4 and DeepSeek V3 to external knowledge bases, such as enterprise documents and cloud storages.

This is often necessary for enterprises that want to build chatbots and other AI applications that reference their internal policies or product catalogs (an alternative, prompting a long context LLM with all the information necessary, may not be suitable for enterprise use cases where security and per-token transmission costs are concerns).

The Pleias-RAG model family is the latest effort to bridge the gap between accuracy and efficiency in small language models.

These models are aimed at enterprises, developers, and researchers looking for cost-effective alternatives to large-scale language models without compromising traceability, multilingual capabilities, or structured reasoning workflows.

The target userbase is actually Pleias’s home continent of Europe, as co-founder Alexander Doria told VentureBeat via direct message on the social network X:

“A primary motivation has been the difficulty of scaling RAG applications in Europe. Most private organization have little GPUs (it may have changed but not long ago less than 2% of all [Nvidia] H100 [GPUs] were in Europe). And yet simultaneously there are strong incentive to self-host for regulated reasons, including GDPR.

SLMs have progressed significantly over the past year, yet they are too often conceived as ‘mini-chatbots’ and we have observed a significant drop of performance in non-English languages, both in terms of source understanding and quality of text generation. So we have been satisfied to hit most of our objectives:

  • An actual alternative to 7-8b models for RAG even on CPU and other constrained infras.
  • Fully verifiable models coming with citation support.
  • Preservation of European language performance.”

However, of course the models being open source under the Apache 2.0 license means anyone could take and use them freely anywhere in the world.

Focused on grounding, citations, and facts

A key feature of the new Pleias-RAG models is their native support for source citation with literal quotes, fully integrated into the model’s inference process.

Unlike post-hoc citation methods or external chunking pipelines, the Pleias-RAG models generate citations directly, using a syntax inspired by Wikipedia’s reference format.

This approach allows for shorter, more readable citation snippets while maintaining verifiability.

Citation grounding plays a functional role in regulated settings.

For sectors like healthcare, legal, and finance — where decision-making must be documented and traceable — these built-in references offer a direct path to auditability. Pleias positions this design choice as an ethical imperative, aligning with increasing regulatory demands for explainable AI.

Proto agentic?

Pleias-RAG models are described as “proto-agentic” — they can autonomously assess whether a query is understandable, determine if it is trivial or complex, and decide whether to answer, reformulate, or refuse based on source adequacy.

Their structured output includes language detection, query and source analysis reports, and a reasoned answer.

Despite their relatively small size (Pleias-RAG-350M has just 350 million parameters) the models exhibit behavior traditionally associated with larger, agentic systems.

According to Pleias, these capabilities stem from a specialized mid-training pipeline that blends synthetic data generation with iterative reasoning prompts.

Pleias-RAG-350M is explicitly designed for constrained environments. It performs well on standard CPUs, including mobile-class infrastructure.

According to internal benchmarks, the unquantized GGUF version produces complete reasoning outputs in roughly 20 seconds on 8GB RAM setups. Its small footprint places it in a niche with very few competitors, such as Qwen-0.5 and SmolLM, but with a much stronger emphasis on structured source synthesis.

Competitive performance across tasks and languages

In benchmark evaluations, Pleias-RAG-350M and Pleias-RAG-1B outperform most open-weight models under 4 billion parameters, including Llama-3.1-8B and Qwen-2.5-7B, on tasks such as HotPotQA, 2WikiMultiHopQA, and MuSiQue.

These multi-hop RAG benchmarks test the model’s ability to reason across multiple documents and identify distractors — common requirements in enterprise-grade knowledge systems.

The models’ strength extends to multilingual scenarios. On translated benchmark sets across French, German, Spanish, and Italian, the Pleias models show negligible degradation in performance.

This sets them apart from other SLMs, which typically experience a 10–35% performance loss when handling non-English queries.

The multilingual support stems from careful tokenizer design and synthetic adversarial training that includes language-switching exercises. The models not only detect the language of a user query but aim to respond in the same language—an important feature for global deployments.

In addition, Doria highlighted how the models could be used to augment the performance of other existing models an enterprise may already be using:

“We envision the models to be used in orchestration setting, especially since their compute cost is low. A very interesting results on the evaluation side: even the 350m model turned out to be good on entirely different answers than the answers [Meta] Llama and [Alibaba] Qwen were performing at. So there’s a real complementarity we attribute to our reasoning pipeline, that goes beyond cost-effectiveness…”

Open access and licensing

According to Doria and a technical paper detailing the training of the Pleias-RAG family, the models were trained on: “Common Corpus to create the RAG training set (all the 3 million examples came from it). We used [Google] Gemma on top for generation of reasoning synthetic traces since the license allowed for reuse/retraining.”

Both models are released under the Apache 2.0 license, allowing for commercial reuse and integration into larger systems.

Pleias emphasizes the models’ suitability for integration into search-augmented assistants, educational tools, and user support systems. The company also provides an API library to simplify structured input-output formatting for developers.

The models’ release is part of a broader push by Pleias to reposition small LLMs as tools for structured reasoning, rather than as general-purpose conversational bots.

By leveraging an external memory architecture and systematic citation methods, the Pleias-RAG series offers a transparent, auditable alternative to more opaque frontier models.

Future outlook

Looking ahead, Pleias plans to expand the models’ capabilities through longer context handling, tighter search integration, and personality tuning for more consistent identity presentation.

Reinforcement learning is also being explored, particularly in domains like citation accuracy, where quote verification can be measured algorithmically.

The team is also actively collaborating with partners such as the Wikimedia Foundation to support targeted search integrations using trusted sources.

Ultimately, the current usage of RAG-specific implementations, models and workflows may fall away as more advanced AI models are trained and deployed, ones that incorporate RAG and agentic tool usage natively. As Doria told VentureBeat via DM:

Long term, my conviction is that both classic RAG pipeline and long context models are going to be disrupted by search agents. We have started to move in this direction: that’s why the model already comes equipped with many features that are currently externalized in RAG applications (query reformulation, reranking, etc.). We obviously aim to go further and integrate search capacities and source processing capacities directly in the model itself. My conviction is that RAG will disappear in a way as it gets automated by agentic models able to direct their own workflows.

With Pleias-RAG-350M and 1B, the company is betting that small models—when paired with strong reasoning scaffolding and verifiable outputs—can compete with much larger counterparts, especially in multilingual and infrastructure-limited deployments.

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Asia-Pacific hits 50% IPv6 capability

Globally, the transition to IPv6 is advancing steadily, with 34% of networks now IPv6-capable. Not all IPv6-capable networks are using it by default; though: Capability means the system can use IPv6 — not that it prefers it. Still, the direction is clear. Countries like Vietnam (60% of networks IPv6-capable), Japan

Read More »

Linkerd 2.18 advances cloud-native service mesh

The project’s focus has evolved significantly over the years. While early adoption centered on mutual TLS between pods, today’s enterprises are tackling much larger challenges. “For a long time, the most common pattern was simply, ‘I want to get mutual TLS between all my pods, which gives me encryption, and

Read More »

18 essential commands for new Linux users

[jdoe@fedora ~]$ ls -ld /home/jdoedrwx——. 1 jdoe jdoe 106 Apr 3 14:39 /home/jdoe As you may have suspected, “r” stands for read, “w” means write and “x” is for execute. Note that no permissions are available for other group members and anyone else on the system. Each user will be

Read More »

Glen Earrach submits pumped hydro plans as poll reveals Highlanders support Loch Ness development

The developers behind plans for the 2 GW Glen Earrach pumped storage hydro (PSH) project have submitted a planning application to the Scottish government. Located at Balmacaan Estate in the Highlands, the Glen Earrach project will account for nearly three-quarters of the total PSH capacity planned for Loch Ness. The site’s 34 GWh capacity will provide power output equivalent to around 800 2.5 MW onshore wind turbines for up to 17 hours. Developer Glen Earrach Energy (GEE) said the project will be one of the largest and most efficient energy storage schemes in the UK once completed in 2030. GEE said the site’s unique topography and 500m gross hydraulic head allow for a more efficient design. The project is among a range of PSH projects in development in Scotland designed to store excess renewable energy and reduce the need for wind curtailment. Glen Earrach pumped storage hydro GEE estimates Glen Earrach will deliver a 10% in the carbon footprint of the UK grid and close to £2.9 billion in net system benefits over its first 20 years of operation. It will also support around 1,000 jobs during the peak of construction, as well as an annual £20m community benefit fund over its 125-year lifespan. GEE director Roderick Macleod said Glen Earrach will deliver the “most substantial community benefit fund ever in Scotland”. “The Highlands deserves the best project, and we remain on track to deliver it, with the first power being produced in 2030,” Macleod said. © Supplied by Glen Earrach EnergyA visualisation of plans for the Glen Earrach pumped storage hydropower project in the Scottish Highlands. “We’ve listened carefully to local views and will keep doing so. “Now we look forward to working with the Scottish Government, The Highland Council and all key stakeholders to deliver this vital project.” Backed by

Read More »

Oil Gains as Supply Tightness Counters Trade Concerns

Oil rose as producers’ promises to keep output growth in check added to signs of the physical market’s strength and a potential easing in trade tensions between the US and China. West Texas Intermediate futures climbed 0.8% to settle near $63 a barrel, while Brent advanced to close around $66.50, after President Donald Trump said that US officials have been holding meetings with Chinese officials on trade as recently as this morning. That countered some pessimism over the major crude importer’s earlier comments that the US should revoke all unilateral tariffs and its dismissal of speculation that progress has been made in bilateral communications. Those developments followed signs in recent days that lower oil prices are starting to curtail some producers’ spending plans. Already, metrics are pointing to a bullish near-term market, with the prompt spread for WTI hovering near the strongest in more than two months, an indication of tight supplies. Geopolitical tensions remain hot as well. Russia hit Ukraine with a barrage of missiles and drones overnight, killing at least nine people in the capital, as peace talks stalled over President Volodymyr Zelenskiy’s vow never to recognize Russian sovereignty over Crimea. The US is set to demand that Russia accept Ukraine’s right to maintain a military force. At one point, crude dipped into negative territory after Axios reported that Iran asked White House envoy Steve Witkoff whether the sides should negotiate an interim deal, potentially decreasing the risk of reduced flows from Tehran.   Oil has dropped sharply this month on concerns that US tariffs and counter-levies from its biggest trading partners will dent economic activity and hurt energy demand. Growing strain within OPEC+, particularly with perennial overproducer Kazakhstan, has stoked fears that output will continue to rise at a faster-than-advertised pace over the coming months. The Organization

Read More »

USA Clashes With Allies on Energy Security Vision at IEA Summit

At this week’s flagship international summit on energy security, the clashing visions of the US and its allies were on full display. Beginning with the event’s hosts — the International Energy Agency and the UK government — speaker after speaker at the London conference extolled the virtues of including renewable fuels in the shift to a sustainable energy future. In Britain, “there is an exciting vision of energy security and abundance from cheap, homegrown low-carbon power,” Energy Secretary Ed Miliband enthused in his opening address. IEA Executive Director Fatih Birol hailed the “remarkable” ascent of renewables, which last year accounted for 85% of new power generation globally. But the meeting’s tone shifted markedly when it was the turn of US Acting Assistant Secretary of Energy for International Affairs, Tommy Joyce. Joyce blamed the “embrace of climate politics” around the world for energy scarcity and causing “harm” to human lives, while boasting of America’s role as the world’s fourth-biggest producer of coal — the most polluting of fuels. He reiterated the Trump administration’s opposition to restricting energy sources in the pursuit of net zero carbon emissions, reprising a Republican party criticism leveled previously at the IEA’s energy forecasts. Of course, the division is hardly a surprise. President Donald Trump has dismissed climate change as a “hoax” and campaigned for re-election on pledges that America’s shale-oil explorers will “drill, baby, drill” the nation’s hydrocarbon bounty.  Nonetheless, when representatives of such conflicting outlooks share a stage, the contrast becomes sharp. The IEA’s summit will continue the rest of Thursday and Friday; reconciling the world-views of the agency’s biggest members will take significantly longer. WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or

Read More »

Controversial plans for Kintore hydrogen plant backed by council despite local protests

Plans to create one of Europe’s largest hydrogen plants near Kintore have taken a step forward – despite calls for it to be thrown out. Statera Energy wants to build the massive 3GW Kintore Hydrogen project near Laylodge. Once constructed, it would be the largest site of its kind in the UK. It has been earmarked for land near the Kintore substation and a recently approved battery energy storage system. The site will produce green hydrogen at an electrolysis plant using surplus wind power generated from turbines and water from the River Don. Water from the river will also be used to cool equipment on the site and would later be returned back to the Don. Members of the Garioch area committee had called for the project to be scrapped last month over fears the Kintore area was becoming too industrialised. Historic Environment Scotland had also objected over fears it would harm the South Leylodge steading stone circle. Kintore hydrogen plant could ensure north-east is ‘global energy leader’ The application went before a council meeting today. It will ultimately be decided by the Scottish Government, but the local authority’s input will be a key consideration. Senior development manager for the Kintore project, William Summerlin, made a case for the hydrogen plant. He claimed the site would create “significant employment and economic opportunities” for the north-east and Scotland. Mr Summerlin also said that more than 3,000 jobs could be created during the construction period with over 300 operational jobs on site and in the supply chain. “Businesses up and down Aberdeenshire are standing ready to tender for this project,” he told the chamber. “Fabrication yards in Aberdeen and throughout the Shire are well-positioned to become assembly yards for electrolyser equipment. “Kintore makes use of abundant Scottish wind power converting it into

Read More »

BHP Prepares to Start Chief Executive Succession Process

BHP Group is preparing to begin looking for a new chief executive officer in the coming months, with key lieutenants already jostling for position to succeed Mike Henry at the top of the world’s biggest miner.  The understanding at BHP is that Henry is now heading toward the end of his tenure, according to company insiders. They emphasized that no decision has been made. But some people close to BHP say a change could come as soon as early next year, and some top executives have begun increasing their interaction with investors and other stakeholders ahead of a likely succession process. The internal frontrunners for the role are seen to be Geraldine Slattery, who heads the company’s Australian mines, Chief Financial Officer Vandita Pant and Ragnar Udd, who runs the commercial team. However, the search is also likely to include external candidates, according to people familiar with the matter, who asked not to be identified discussing private information. A change of leadership would come at a pivotal time for both BHP and the wider mining sector. The company and its biggest rivals spent the past couple of years pursuing a series of failed megadeals, while US President Donald Trump’s trade war has cast fresh uncertainty over future demand for key commodities.  BHP itself is embarking on a slew of expensive growth projects and Henry’s successor is likely to face tough questions about capital allocation, including whether the company can pursue its aggressive spending plans while sustaining its dividend and debt policies.  The miner is already tightening its belt and has significantly sharpened its focus on cost cutting across the business, some of the people said. BHP declined to comment. The process to find a replacement for Henry is likely to start in earnest in the coming months, the people said,

Read More »

Grangemouth a ‘good example of transition done badly’ – Shanks

The closure of Scotland’s last oil refinery at Grangemouth is a “really good example of a transition done badly”, the UK energy minister has admitted. Speaking at an industry conference in London, Michael Shanks said he was “acutely aware that there is uncertainty and there is unease in the industry”. He said the situation at Grangemouth, which will see hundreds of jobs lost when refinery owner PetroIneos shutters the facility, was a problem his government “inherited”. Labour politicians in Scotland and Westminster have come under fire for failing to fulfil pledges to save jobs at the site. The UK and Scottish Governments have jointly drawn up a plan called “Project Willow” aimed at delivering a long-term industrial future for Grangemouth through investment in a number of energy and recycling schemes. Speaking to the audience on the second day of the North Sea Decarbonisation Conference, Shanks said the “problems” should have been addressed years earlier. “There is a kind of truth in government, that you sort of wish you could have dealt with some of the problems you inherit on day one, many, many years before,” he said. “The most acute example for me is Grangemouth, which is a really good example of a transition done badly. “You wish you could have tackled these things five, six years ago, when they first emerged. “You don’t get to choose your entry as a government minister, and that’s just the reality of it. But what we are seeking to do is grapple with the uncertainty and the challenge.” Shanks urged attendees at the conference to engage with the government’s current consultation on the North Sea, which is set to close 30 April. The minister said energy supply chain responses to its “Building the North Sea’s Energy Future: Consultation,” consultation will help the government

Read More »

Slowdown in AWS data center leasing plans poses little threat to CIOs

Oracle, according to Westfall, is committed to investing $10 billion in 2025 to build 100 new data centers and expand 66 existing ones, aiming to double its capacity this year. Likewise, Google is investing $75 billion in 2025 for data center construction, focusing on AI and cloud infrastructure, with projects such as a $600 million facility in Mesa, Arizona, and a $2 billion data center in Fort Wayne and Indiana underway, Westfall said. Meta, too, plans to spend up to $65 billion in 2025, a sizable bump up from $40 billion in 2024, primarily for data center expansion to support AI (Llama models, Meta AI) and metaverse workloads, Westfall added. However, these expansion plans will not result in the relatively smaller players catching up with AWS and Microsoft. “For smaller players like Google and Oracle, catching up with AWS and Microsoft would require historically large capital investments that likely aren’t justified by their current growth rates,” Alletto said.

Read More »

TSMC targets AI acceleration with A14 process and ‘System on Wafer-X’

Nvidia’s flagship GPUs currently integrate two chips, while its forthcoming Rubin Ultra platform will connect four. “The SoW-X delivers wafer-scale compute performance and significantly boosts speed by integrating multiple advanced compute SoC dies, stacked HBM memory, and optical interconnects into a single package,” said Neil Shah, partner and co-founder at Counterpoint Research. “This approach reduces latency, improves power efficiency, and enhances scalability compared to traditional multi-chip setups — giving enterprises and hyperscalers AI servers capable of handling future workloads faster, more efficiently, and in a smaller footprint.” This not only boosts capex savings in the long run but also opex savings in terms of energy and space. “Wafer-X technology isn’t just about bigger chips — it’s a signal that the future of AI infrastructure is being redesigned at the silicon level,” said Abhivyakti Sengar, practice director at Everest Group. “By tightly integrating compute, memory, and optical interconnects within a single wafer-scale package, TSMC targets the core constraints of AI: bandwidth and energy. For hyperscale data centers and frontier model training, this could be a game-changer.” Priorities for enterprise customers For enterprises investing in custom AI silicon, choosing the right foundry partner goes beyond performance benchmarks. It’s about finding a balance between cutting-edge capabilities, flexibility, and cost. “First, enterprise buyers need to assess manufacturing process technologies (such as TSMC’s 3nm, 2nm, or Intel’s 18A) to determine if they meet AI chip performance and power requirements, along with customization capabilities,” said Galen Zeng, senior research manager for semiconductor research at IDC Asia Pacific. “Second, buyers should evaluate advanced packaging abilities; TSMC leads in 3D packaging and customized packaging solutions, suitable for highly integrated AI chips, while Intel has advantages in x86 architecture. Finally, buyers should assess pricing structures.”

Read More »

Cloudbrink pushes SASE boundaries with 300 Gbps data center throughput

Those core components are functionally table stakes and don’t really serve to differentiate Cloudbrink against its myriad competitors in the SASE market. Where Cloudbrink looks to differentiate is at a technical level through a series of innovations including: Distributed edge architecture: The company has decoupled software from hardware, allowing their platform to run across 800 data centers by leveraging public clouds, telco networks and edge computing infrastructure. This approach reduces network latency from 300 milliseconds to between 7 and 20 milliseconds, the company says. This density dramatically improves TCP performance and responsiveness. Protocol optimization: Cloudbrink developed its own algorithms for SD-WAN optimization that bring enterprise-grade reliability to last mile links. These algorithms significantly improve efficiency on consumer broadband connections, enabling enterprise-grade performance over standard internet links. Integrated security stack: “We’ve been able to produce secure speeds at line rate on our platform by bringing security to the networking stack itself,” Mana noted. Rather than treating security as a separate overlay that degrades performance, Cloudbrink integrates security functions directly into the networking stack. The solution consists of three core components: client software for user devices, a cloud management plane, and optional data center connectors for accessing internal applications. The client intelligently connects to multiple edge nodes simultaneously, providing redundancy and application-specific routing optimization. Cloudbrink expands global reach Beyond its efforts to increase throughput, Cloudbrink is also growing its global footprint. Cloudbrink today announced a global expansion through new channel agreements and the opening of a Brazil office to serve emerging markets in Latin America, Korea and Africa. The expansion includes exclusive partnerships with WITHX in Korea, BAMM Technologies for Latin America distribution and OneTic for African markets. The company’s software-defined FAST (Flexible, Autonomous, Smart and Temporary) Edges technology enables rapid deployment of points of presence by leveraging existing infrastructure from multiple

Read More »

CIOs could improve sustainability with data center purchasing decisions — but don’t

CIOs can drive change Even though it’s difficult to calculate an organization’s carbon footprint, CIOs and IT purchasing leaders trying to reduce their environmental impact can influence data center operators, experts say. “Customers have a very large voice,” Seagate’s Feist says. “Don’t underestimate how powerful that CIO feedback loop is. The large cloud accounts are customer-obsessed organizations, so they listen, and they react.” While DataBank began using renewable energy years ago, customer demand can push more data center operators to follow suit, Gerson says. “For sure, if there is a requirement to purchase renewable power, we are going to purchase renewable power,” she adds.

Read More »

Copper-to-optics technology eyed for next-gen AI networking gear

Broadcom’s demonstration and a follow-up session explored the benefits of further developing CPC, such as reduced signal integrity penalties and extended reach, through channel modeling and simulations, Broadcom wrote in a blog about the DesignCon event. “Experimental results showed successful implementation of CPC, demonstrating its potential to address bandwidth and signal integrity challenges in data centers, which is crucial for AI applications,” Broadcom stated. In addition to the demo, Broadcom and Samtec also authored a white paper on CPC that stated: “Co-packaged connectivity (CPC) provides the opportunity to omit loss and reflection penalties from the [printed circuit board (PCB)] and the package. When high speed I/O is cabled from the top of the package advanced PCB materials are not necessary. Losses from package vertical paths and PCB routing can be transferred to the longer reach of cables,” the authors stated. “As highly complex systems are challenged to scale the number of I/O and their reach, co- packaged connectivity presents opportunity. As we approach 224G-PAM4 [which uses optical techniques to support 224 Gigabits per second data rates per optical lane] and above, system loss and dominating noise sources necessitate the need to re-consider that which has been restricted in the back of the system architect’s mind for years: What if we attached to the package?” At OFC, Samtec demonstrated its Si-FlyHD co-packaged cable assemblies and Samtec FlyoverOctal Small Form-factor Pluggable (OSFP) over the Samtec Eye Speed Hyper Low Skew twinax copper cable. Flyover is Samtec’s proprietary way of addressing signal integrity and reach limitations of routing high-speed signals through traditional printed circuit boards (PCBs). “This evaluation platform incorporates Broadcom’s industry-leading 200G SerDes technology and Samtec’s co-packaged Flyover technology. Si-Fly HD CPC offers the industry’s highest footprint density and robust interconnect which enables 102.4T (512 lanes at 200G) in a 95 x

Read More »

The Rise of AI Factories: Transforming Intelligence at Scale

AI Factories Redefine Infrastructure The architecture of AI factories reflects a paradigm shift that mirrors the evolution of the industrial age itself—from manual processes to automation, and now to autonomous intelligence. Nvidia’s framing of these systems as “factories” isn’t just branding; it’s a conceptual leap that positions AI infrastructure as the new production line. GPUs are the engines, data is the raw material, and the output isn’t a physical product, but predictive power at unprecedented scale. In this vision, compute capacity becomes a strategic asset, and the ability to iterate faster on AI models becomes a competitive differentiator, not just a technical milestone. This evolution also introduces a new calculus for data center investment. The cost-per-token of inference—how efficiently a system can produce usable AI output—emerges as a critical KPI, replacing traditional metrics like PUE or rack density as primary indicators of performance. That changes the game for developers, operators, and regulators alike. Just as cloud computing shifted the industry’s center of gravity over the past decade, the rise of AI factories is likely to redraw the map again—favoring locations with not only robust power and cooling, but with access to clean energy, proximity to data-rich ecosystems, and incentives that align with national digital strategies. The Economics of AI: Scaling Laws and Compute Demand At the heart of the AI factory model is a requirement for a deep understanding of the scaling laws that govern AI economics. Initially, the emphasis in AI revolved around pretraining large models, requiring massive amounts of compute, expert labor, and curated data. Over five years, pretraining compute needs have increased by a factor of 50 million. However, once a foundational model is trained, the downstream potential multiplies exponentially, while the compute required to utilize a fully trained model for standard inference is significantly less than

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »