Stay Ahead, Stay ONMINE

How LLMs Work: Pre-Training to Post-Training, Neural Networks, Hallucinations, and Inference

With the recent explosion of interest in large language models (LLMs), they often seem almost magical. But let’s demystify them. I wanted to step back and unpack the fundamentals — breaking down how LLMs are built, trained, and fine-tuned to become the AI systems we interact with today. This two-part deep dive is something I’ve been meaning […]

With the recent explosion of interest in large language models (LLMs), they often seem almost magical. But let’s demystify them.

I wanted to step back and unpack the fundamentals — breaking down how LLMs are built, trained, and fine-tuned to become the AI systems we interact with today.

This two-part deep dive is something I’ve been meaning to do for a while and was also inspired by Andrej Karpathy’s widely popular 3.5-hour YouTube video, which has racked up 800,000+ views in just 10 days. Andrej is a founding member of OpenAI, his insights are gold— you get the idea.

If you have the time, his video is definitely worth watching. But let’s be real — 3.5 hours is a long watch. So, for all the busy folks who don’t want to miss out, I’ve distilled the key concepts from the first 1.5 hours into this 10-minute read, adding my own breakdowns to help you build a solid intuition.

What you’ll get

Part 1 (this article): Covers the fundamentals of LLMs, including pre-training to post-training, neural networks, Hallucinations, and inference.

Part 2: Reinforcement learning with human/AI feedback, investigating o1 models, DeepSeek R1, AlphaGo

Let’s go! I’ll start with looking at how LLMs are being built.

At a high level, there are 2 key phases: pre-training and post-training.

1. Pre-training

Before an LLM can generate text, it must first learn how language works. This happens through pre-training, a highly computationally intensive task.

Step 1: Data collection and preprocessing

The first step in training an LLM is gathering as much high-quality text as possible. The goal is to create a massive and diverse dataset containing a wide range of human knowledge.

One source is Common Crawl, which is a free, open repository of web crawl data containing 250 billion web pages over 18 years. However, raw web data is noisy — containing spam, duplicates and low quality content — so preprocessing is essential.If you’re interested in preprocessed datasets, FineWeb offers a curated version of Common Crawl, and is made available on Hugging Face.

Once cleaned, the text corpus is ready for tokenization.

Step 2: Tokenization

Before a neural network can process text, it must be converted into numerical form. This is done through tokenization, where words, subwords, or characters are mapped to unique numerical tokens.

Think of tokens as the building blocks — the fundamental building blocks of all language models. In GPT4, there are 100,277 possible tokens.A popular tokenizer, Tiktokenizer, allows you to experiment with tokenization and see how text is broken down into tokens. Try entering a sentence, and you’ll see each word or subword assigned a series of numerical IDs.

Step 3: Neural network training

Once the text is tokenized, the neural network learns to predict the next token based on its context. As shown above, the model takes an input sequence of tokens (e.g., “we are cook ing”) and processes it through a giant mathematical expression — which represents the model’s architecture — to predict the next token.

A neural network consists of 2 key parts:

  1. Parameters (weights) — the learned numerical values from training.
  2. Architecture (mathematical expression) — the structure defining how the input tokens are processed to produce outputs.

Initially, the model’s predictions are random, but as training progresses, it learns to assign probabilities to possible next tokens.

When the correct token (e.g. “food”) is identified, the model adjusts its billions of parameters (weights) through backpropagation — an optimization process that reinforces correct predictions by increasing their probabilities while reducing the likelihood of incorrect ones.

This process is repeated billions of times across massive datasets.

Base model — the output of pre-training

At this stage, the base model has learned:

  • How words, phrases and sentences relate to each other
  • Statistical patterns in your training data

However, base models are not yet optimised for real-world tasks. You can think of them as an advanced autocomplete system — they predict the next token based on probability, but with limited instruction-following ability.

A base model can sometimes recite training data verbatim and can be used for certain applications through in-context learning, where you guide its responses by providing examples in your prompt. However, to make the model truly useful and reliable, it requires further training.

2. Post training — Making the model useful

Base models are raw and unrefined. To make them helpful, reliable, and safe, they go through post-training, where they are fine-tuned on smaller, specialised datasets.

Because the model is a neural network, it cannot be explicitly programmed like traditional software. Instead, we “program” it implicitly by training it on structured labeled datasets that represent examples of desired interactions.

How post training works

Specialised datasets are created, consisting of structured examples on how the model should respond in different situations. 

Some types of post training include:

  1. Instruction/conversation fine tuning
    Goal: To teach the model to follow instructions, be task oriented, engage in multi-turn conversations, follow safety guidelines and refuse malicious requests, etc.
    Eg: InstructGPT (2022): OpenAI hired some 40 contractors to create these labelled datasets. These human annotators wrote prompts and provided ideal responses based on safety guidelines. Today, many datasets are generated automatically, with humans reviewing and editing them for quality.
  2. Domain specific fine tuning
    Goal: Adapt the model for specialised fields like medicine, law and programming.

Post training also introduces special tokens — symbols that were not used during pre-training — to help the model understand the structure of interactions. These tokens signal where a user’s input starts and ends and where the AI’s response begins, ensuring that the model correctly distinguishes between prompts and replies.

Now, we’ll move on to some other key concepts.

Inference — how the model generates new text

Inference can be performed at any stage, even midway through pre-training, to evaluate how well the model has learned.

When given an input sequence of tokens, the model assigns probabilities to all possible next tokens based on patterns it has learned during training.

Instead of always choosing the most likely token, it samples from this probability distribution — similar to flipping a biased coin, where higher-probability tokens are more likely to be selected.

This process repeats iteratively, with each newly generated token becoming part of the input for the next prediction. 

Token selection is stochastic and the same input can produce different outputs. Over time, the model generates text that wasn’t explicitly in its training data but follows the same statistical patterns.

Hallucinations — when LLMs generate false info

Why do hallucinations occur?

Hallucinations happen because LLMs do not “know” facts — they simply predict the most statistically likely sequence of words based on their training data.

Early models struggled significantly with hallucinations.

For instance, in the example below, if the training data contains many “Who is…” questions with definitive answers, the model learns that such queries should always have confident responses, even when it lacks the necessary knowledge.

When asked about an unknown person, the model does not default to “I don’t know” because this pattern was not reinforced during training. Instead, it generates its best guess, often leading to fabricated information.

How do you reduce hallucinations?

Method 1: Saying “I don’t know”

Improving factual accuracy requires explicitly training the model to recognise what it does not know — a task that is more complex than it seems.

This is done via self interrogation, a process that helps define the model’s knowledge boundaries.

Self interrogation can be automated using another AI model, which generates questions to probe knowledge gaps. If it produces a false answer, new training examples are added, where the correct response is: “I’m not sure. Could you provide more context?”

If a model has seen a question many times in training, it will assign a high probability to the correct answer.

If the model has not encountered the question before, it distributes probability more evenly across multiple possible tokens, making the output more randomised. No single token stands out as the most likely choice.

Fine tuning explicitly trains the model to handle low-confidence outputs with predefined responses. 

For example, when I asked ChatGPT-4o, “Who is asdja rkjgklfj?”, it correctly responded: “I’m not sure who that is. Could you provide more context?”

Method 2: Doing a web search

A more advanced method is to extend the model’s knowledge beyond its training data by giving it access to external search tools.

At a high level, when a model detects uncertainty, it can trigger a web search. The search results are then inserted into a model’s context window — essentially allowing this new data to be part of it’s working memory. The model references this new information while generating a response.

Vague recollections vs working memory

Generally speaking, LLMs have two types of knowledge access.

  1. Vague recollections — the knowledge stored in the model’s parameters from pre-training. This is based on patterns it learned from vast amounts of internet data but is not precise nor searchable.
  2. Working memory — the information that is available in the model’s context window, which is directly accessible during inference. Any text provided in the prompt acts as a short term memory, allowing the model to recall details while generating responses.

Adding relevant facts within the context window significantly improves response quality.

Knowledge of self 

When asked questions like “Who are you?” or “What built you?”, an LLM will generate a statistical best guess based on its training data, unless explicitly programmed to respond accurately. 

LLMs do not have true self-awareness, their responses depend on patterns seen during training.

One way to provide the model with a consistent identity is by using a system prompt, which sets predefined instructions about how it should describe itself, its capabilities, and its limitations.

To end off

That’s a wrap for Part 1! I hope this has helped you build intuition on how LLMs work. In Part 2, we’ll dive deeper into reinforcement learning and some of the latest models.

Got questions or ideas for what I should cover next? Drop them in the comments — I’d love to hear your thoughts. See you in Part 2! 🙂

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Three options for wireless power in the enterprise

Sensors such as these can be attached to pallets to track its location, says Srivastava. “People in Europe are very conscious about where their food is coming from and, to comply with regulations, companies need to have sensors on the pallets,” he says. “Or they might need to know that

Read More »

IBM unveils advanced quantum computer in Spain

IBM executives and officials from the Basque Government and regional councils in front of Europe’s first IBM Quantum System Two, located at the IBM-Euskadi Quantum Computational Center in San Sebastián, Spain. The Basque Government and IBM unveil the first IBM Quantum System Two in Europe at the IBM-Euskadi Quantum Computational

Read More »

EIA Raises USA Total Energy Consumption Forecasts

In its latest short term energy outlook (STEO), which was released on October 7, the U.S. Energy Information Administration (EIA) raised its U.S. total energy consumption forecast for 2025 and 2026. According to this STEO, the EIA sees total energy consumption coming in at 95.76 quadrillion British thermal units (qBtu) this year and 96.02 qBtu next year. In its previous STEO, which was released in September, the EIA projected that total energy consumption would be 95.50 qBtu in 2025 and 95.96 qBtu in 2026. The EIA’s October STEO showed that total energy consumption was 94.57 qBtu in 2024. In its September STEO, the EIA highlighted that this figure stood at 94.22 qBtu. In its latest STEO, the EIA projected that total energy consumption will come in at 24.01 qBtu in the fourth quarter of 2025, 24.83 qBtu in the first quarter of 2026, 22.51 qBtu in the second quarter, 24.30 qBtu in the third quarter, and 24.38 qBtu in the fourth quarter. The STEO showed that total energy consumption hit 25.45 qBtu in the first quarter of this year, 22.45 qBtu in the second quarter, and 23.85 qBtu in the third quarter. In its September STEO, the EIA forecast that total energy consumption would be 23.79 qBtu in the third quarter of this year, 24.00 qBtu in the fourth quarter, 24.82 qBtu in the first quarter, 22.48 qBtu in the second quarter, 24.32 qBtu in the third quarter, and 24.33 qBtu in the fourth quarter. That STEO showed that total energy consumption was 25.43 qBtu in the first quarter of 2025 and 22.28 qBtu in the second quarter. The EIA projected that U.S. liquid fuels consumption will average 20.47 million barrels per day this year and 20.48 million barrels per day next year in its October STEO. In its previous

Read More »

Perenco CEO Looks to Enter New Countries

Perenco SA, owned by the billionaire Perrodo family, is looking for opportunities to enter into new countries as it seeks the resources to maintain its current oil production.  “What I want to do is to build a strong foundation of 500,000” barrels a day, Perenco Chief Executive Officer Armel Simondin said in an interview. “Once we have established that, then we consider going further,” he said, adding that recent output was slightly higher. The group, which specializes in extracting crude from mature oil fields, derives most of its production from the central region of Africa, with operations spanning more than a dozen countries globally. Its output is many times bigger than listed companies such as Tullow Oil Plc and Kosmos Energy Ltd., which also focus on the continent.  While Gabon, Cameroon and Republic of Congo make up about half of Perenco’s current production, its geographic footprint could change with the ongoing search to replace diminishing crude deposits. Just staying at its current level requires finding 100 million barrels of oil resources every year, according to Simondin.  “We are not in a rush, but we are going to look at new countries in the coming years because we need to extend our base,” he said. Perenco completed buying assets from Woodside Energy Group Ltd. months ago in Trinidad and Tobago where, along with fields acquired from BP Plc it’s become the second largest oil and gas producer in the country, according to the Energy Chamber, an industry lobby group in the country.  The company’s hub remains central Africa, where it will direct about three quarters of its $2 billion annual investment — a spending level that it wants to maintain even as crude prices have weakened. Perenco has also increased its focus on natural gas, which is expected to grow to 40% of

Read More »

NextDecade Approves Rio Grande LNG Train 5

NextDecade Corp on Thursday announced a final investment decision (FID) and financial close for the $6.7-billion fifth train of Rio Grande LNG in Brownsville, Texas, as well as a full notice to proceed to contractor Bechtel Energy Inc. “Train 5 has expected LNG production capacity of approximately six million tonnes per annum (MTPA), bringing the total expected LNG production capacity under construction at Rio Grande LNG to approximately 30 MTPA”, the Houston, Texas-based developer said in a statement on its website. Train 5 has 20-year purchase commitments totaling 4.5 MTPA from EQT Corp, ConocoPhillips and JERA Co Inc. “The guaranteed substantial completion date for train 5, as well as the date of first commercial delivery under the train 5 LNG SPAs [sale and purchase agreements], is anticipated in the first half of 2031”, NextDecade said. NextDecade has secured financing to cover the full expected cost of train 5 and supporting infrastructure. The commitments comprise a $3.59-billion term loan facility signed by Rio Grande LNG Train 5 LLC; $1.29 billion in equity commitments from NextDecade; $1.29 billion in equity commitments from Global Infrastructure Partners (GIP), GIC and Mubadala Investment Co; and a $500-million private notes placement by Rio Grande LNG Train 5 LLC. “NextDecade used $233 million of cash on hand and entered into a total of $1.33 billion in term loans to finance its portion of train 5 equity funding commitments without a material impact to NextDecade common shares outstanding”, the statement said. “The FinCo Loan is a $729-million delayed draw bank facility that bears interest at SOFR plus 350 basis points”, NextDecade said. “The SuperFinCo Loan is a $600-million term loan, with net proceeds disbursed at financial close”, it added. “NextDecade has an initial economic interest of 50 percent in train 5, which will increase to 70 percent after the financial investors [GIP, GIC and Mubadala] achieve certain returns on their

Read More »

USA Crude Oil Stocks Rise Week on Week

U.S. commercial crude oil inventories, excluding those in the Strategic Petroleum Reserve (SPR), increased by 3.5 million barrels from the week ending October 3 to the week ending October 10, the U.S. Energy Information Administration (EIA) highlighted in its latest weekly petroleum status report. That report was released on October 16 and included data for the week ending October 10. It showed that crude oil stocks, not including the SPR, stood at 423.8 million barrels on October 10, 420.3 million barrels on October 3, and 420.6 million barrels on October 11, 2024. Crude oil in the SPR stood at 407.7 million barrels on October 10, 407.0 million barrels on October 3, and 383.9 million barrels on October 11, 2024, the report revealed. Total petroleum stocks – including crude oil, total motor gasoline, fuel ethanol, kerosene type jet fuel, distillate fuel oil, residual fuel oil, propane/propylene, and other oils – stood at 1.696 billion barrels on October 10, the report highlighted. Total petroleum stocks were up 2.4 million barrels week on week and up 60.7 million barrels year on year, the report showed. “At 423.8 million barrels, U.S. crude oil inventories are about four percent below the five year average for this time of year,” the EIA said in its weekly petroleum status report. “Total motor gasoline inventories decreased by 0.3 million barrels from last week and are slightly below the five year average for this time of year. Both finished gasoline and blending components inventories decreased last week,” it added. “Distillate fuel inventories decreased by 4.5 million barrels last week and are about seven percent below the five year average for this time of year. Propane/propylene inventories increased by 1.9 million barrels from last week and are 11 percent above the five year average for this time of year,” it

Read More »

BofA Sees Oil Price Floor ‘Likely Forming at $55’

A BofA Global Research report sent to Rigzone by the BofA team recently noted that BofA sees “a[n] [oil price] floor likely forming at $55 per barrel”. That report also revealed that the company is maintaining its Brent forecast of $61 per barrel in the fourth quarter of 2025 and $64 per barrel in the first half of 2026. The report went on to warn, however, that “if U.S.-China trade tensions escalate in the midst of the OPEC+ production ramp up, Brent could drop below $50 per barrel”. In the report, BofA said market participants have been “sick worried about a crude oil glut for almost a year now” and pointed out that front month Brent and WTI crude oil prices have come down by about 50 percent from their respective peaks of $128 per barrel and $124 per barrel in 2022. “Of course, weaker oil prices this year have a lot to do with OPEC+ agreeing to increase quotas within the Group of 8 by about four million barrels per day over 18 months starting in April 2025,” the report noted. “Oil markets have already been on a surplus for some time, although inventories across the OECD remain low because most excess barrels have gone into Chinese strategic storage,” it added. “Rapid strategic oil stockpiling in China and a looming surplus in 1H26 have resulted in an odd term structure in Brent: tight in the front, loose in the back,” it continued. “Yet, oil prices have come down quickly in recent days as China reimposed some limits on rare earth elements (REE), the U.S. threatened China with fresh tariffs, and Iran threw down the gauntlet by turning on transponders to show the world where its oil is going,” the report went on to state. BofA noted in its report

Read More »

Sanctioned China Port Receives Russian LNG

A shipment of Russian liquefied natural gas arrived at a Chinese terminal for the first time since the UK sanctioned the port facility, underscoring Beijing’s appetite for Moscow’s energy supplies despite Western efforts to curb such trade. The Arctic Mulan vessel, carrying fuel from the already blacklisted Arctic LNG 2 plant in Russia, landed at the Beihai LNG station on Friday, according to ship-tracking data compiled by Bloomberg. The UK sanctioned the terminal because it has been receiving the restricted Russian cargoes since late August. The move comes as Washington is in the midst of an escalating trade war with Beijing, while President Donald Trump is trying to broker a peace deal with Russia over the conflict in Ukraine. Trump and other Western nations have been looking to tighten Russia’s oil and gas exports in a bid to reduce its revenues. While the sanctions include a wind-down period that runs until Nov. 13, at least one shipment that appears to be en route to southern China from the Arctic region could arrive after that date – a sign Beijing likely won’t slow its trade with Russia. Two additional cargoes in East Asia are also heading to southern China, ship data show. Russia and China had anticipated possible Western retaliation against Beihai. The Asian nation designated the terminal as the sole entry point for cargoes from Arctic LNG 2 – a Russian project already sanctioned by the US and UK. Other Chinese importers have since stopped using the terminal. Arctic Mulan loaded an LNG shipment from a floating storage unit in eastern Russia in early October, according to ship-tracking data. The fuel in storage was sourced from the Arctic LNG 2 project. The storage facility and Arctic Mulan have been previously sanctioned by western nations. What do you think? We’d love to hear

Read More »

Roundup: Digital Realty Marks Major Milestones in AI, Quantum Computing, Data Center Development

Key features of the DRIL include: • High-Density AI and HPC Testing. The DRIL supports AI and high-performance computing (HPC) workloads with high-density colocation, accommodating workloads up to 150 kW per cabinet. • AI Infrastructure Optimization. The ePlus AI Experience Center lets businesses explore AI-specific power, cooling, and GPU resource requirements in an environment optimized for AI infrastructure. • Hybrid Cloud Validation. With direct cloud connectivity, users can refine hybrid strategies and onboard through cross connects. • AI Workload Orchestration. Customers can orchestrate AI workloads across Digital Realty’s Private AI Exchange (AIPx) for seamless integration and performance. • Latency Testing Across Locations. Enterprises can test latency scenarios for seamless performance across multiple locations and cloud destinations. The firm’s Northern Virginia campus is the primary DRIL location, but companies can also test latency scenarios between there and other remote locations. DRIL rollout to other global locations is already in progress, and London is scheduled to go live in early 2026. Digital Realty, Redeployable Launch Pathway for Veteran Technical Careers As new data centers are created, they need talented workers. To that end, Digital Realty has partnered with Redeployable, an AI-powered career platform for veterans, to expand access to technical careers in the United Kingdom and United States. The collaboration launched a Site Engineer Pathway, now live on the Redeployable platform. It helps veterans explore, prepare for, and transition into roles at Digital Realty. Nearly half of veterans leave their first civilian role within a year, often due to unclear expectations, poor skill translation, and limited support, according to Redeployable. The Site Engineer Pathway uses real-world relevance and replaces vague job descriptions with an experience-based view of technical careers. Veterans can engage in scenario-based “job drops” simulating real facility and system challenges so they can assess their fit for the role before applying. They

Read More »

BlackRock’s $40B data center deal opens a new infrastructure battle for CIOs

Everest Group partner Yugal Joshi said, “CIOs are under significant pressure to clearly define their data center strategy beyond traditional one-off leases. Given most of the capacity is built and delivered by fewer players, CIOs need to prepare for a higher-price market with limited negotiation power.” The numbers bear this out. Global data center costs rose to $217.30 per kilowatt per month in the first quarter of 2025, with major markets seeing increases of 17-18% year-over-year, according to CBRE. Those prices are at levels last seen in 2011-2012, and analysts expect them to remain elevated. Gogia said, “The combination of AI demand, energy scarcity, and environmental regulation has permanently rewritten the economics of running workloads. Prices that once looked extraordinary have now become baseline.” Hyperscalers get first dibs The consolidation problem is compounded by the way capacity is being allocated. North America’s data center vacancy rate fell to 1.6% in the first half of 2025, with Northern Virginia posting just 0.76%, according to CBRE Research. More troubling for enterprises: 74.3% of capacity currently under construction is already preleased, primarily to cloud and AI providers. “The global compute market is no longer governed by open supply and demand,” Gogia said. “It is increasingly shaped by pre-emptive control. Hyperscalers and AI majors are reserving capacity years in advance, often before the first trench for power is dug. This has quietly created a two-tier world: one in which large players guarantee their future and everyone else competes for what remains.” That dynamic forces enterprises into longer planning cycles. “CIOs must forecast their infrastructure requirements with the same precision they apply to financial budgets and talent pipelines,” Gogia said. “The planning horizon must stretch to three or even five years.”

Read More »

Nvidia, Infineon partner for AI data center power overhaul

The solution is to convert power right at the GPU on the server board and to upgrade the backbone to 800 volts. That should squeeze more reliability and efficiency out of the system while dealing with the heat, Infineon stated.   Nvidia announced the 800 Volt direct current (VDC) power architecture at Computex 2025 as a much-needed replacement for the 54 Volt backbone currently in use, which is overwhelmed by the demand of AI processors and increasingly prone to failure. “This makes sense with the power needs of AI and how it is growing,” said Alvin Nguyen, senior analyst with Forrester Research. “This helps mitigate power losses seen from lower voltage and AC systems, reduces the need for materials like copper for wiring/bus bars, better reliability, and better serviceability.” Infineon says a shift to a centralized 800 VDC architecture allows for reduced power losses, higher efficiency and reliability. However, the new architecture requires new power conversion solutions and safety mechanisms to prevent potential hazards and costly server downtimes such as service and maintenance.

Read More »

Meta details cutting-edge networking technologies for AI infrastructure

ESUN initiative As part of its standardization efforts, Meta said it would be a key player in the new Ethernet for Scale-Up Networking (ESUN) initiative that brings together AMD, Arista, ARM, Broadcom, Cisco, HPE Networking, Marvell, Microsoft, NVIDIA, OpenAI and Oracle to advance the networking technology to handle the growing scale-up domain for AI systems. ESUN will focus solely on open, standards-based Ethernet switching and framing for scale-up networking—excluding host-side stacks, non-Ethernet protocols, application-layer solutions, and proprietary technologies. The group will focus on the development and interoperability of XPU network interfaces and Ethernet switch ASICs for scale-up networks, the OCP wrote in a blog. ESUN will actively engage with other organizations such as Ultra-Ethernet Consortium (UEC) and long-standing IEEE 802.3 Ethernet to align open standards, incorporate best practices, and accelerate innovation, the OCP stated. Data center networking milestones The launch of ESUN is just one of the AI networking developments Meta shared at the event. Meta engineers also announced three data center networking innovations aimed at making its infrastructure more flexible, scalable, and efficient: The evolution of Meta’s Disaggregated Scheduled Fabric (DSF) to support scale-out interconnect for large AI clusters that span entire data center buildings. A new Non-Scheduled Fabric (NSF) architecture based entirely on shallow-buffer, disaggregated Ethernet switches that will support our largest AI clusters like Prometheus. The addition of Minipack3N, based on Nvidia’s Ethernet Spectrum-4 ASIC, to Meta’s portfolio of 51Tbps OCP switches that use OCP’s Switch Abstraction Interface and Meta’s Facebook Open Switching System (FBOSS) software stack. DSF is Meta’s open networking fabric that completely separates switch hardware, NICs, endpoints, and other networking components from the underlying network and uses OCP-SAI and FBOSS to achieve that, according to Meta. It supports Ethernet-based RoCE RDMA over Converged Ethernet (RoCE/RDMA)) to endpoints, accelerators and NICs from multiple vendors, such as Nvidia,

Read More »

Arm joins Open Compute Project to build next-generation AI data center silicon

Keeping up with the demand comes down to performance, and more specifically, performance per watt. With power limited, OEMs have become much more involved in all aspects of the system design, rather than pulling silicon off the shelf or pulling servers or racks off the shelf. “They’re getting much more specific about what that silicon looks like, which is a big departure from where the data center was ten or 15 years ago. The point here being is that they look to create a more optimized system design to bring the acceleration closer to the compute, and get much better performance per watt,” said Awad. The Open Compute Project is a global industry organization dedicated to designing and sharing open-source hardware configurations for data center technologies and infrastructure. It covers everything from silicon products to rack and tray design.  It is hosting its 2025 OCP Global Summit this week in San Jose, Calif. Arm also was part of the Ethernet for Scale-Up Networking (ESUN) initiative announced this week at the Summit that included AMD, Arista, Broadcom, Cisco, HPE Networking, Marvell, Meta, Microsoft, and Nvidia. ESUN promises to advance Ethernet networking technology to handle scale-up connectivity across accelerated AI infrastructures. Arm’s goal by joining OCP is to encourage knowledge sharing and collaboration between companies and users to share ideas, specifications and intellectual property. It is known for focusing on modular rather than monolithic designs, which is where chiplets come in. For example, customers might have multiple different companies building a 64-core CPU and then choose IO to pair it with, whether like PCIe or an NVLink. They then choose their own memory subsystem, deciding whether to go HBM, LPDDR, or DDR. It’s all mix and match like Legos, Awad said.

Read More »

BlackRock-Led Consortium to Acquire Aligned Data Centers in $40 Billion AI Infrastructure Deal

Capital Strategy and Infrastructure Readiness The AIP consortium has outlined an initial $30 billion in equity, with potential to scale toward $100 billion including debt over time as part of a broader AI infrastructure buildout. The Aligned acquisition represents a cornerstone investment within that capital roadmap. Aligned’s “ready-to-scale” platform – encompassing land, permits, interconnects, and power roadmaps – is far more valuable today than a patchwork of single-site developments. The consortium framed the transaction as a direct response to the global AI buildout crunch, targeting critical land, energy, and equipment bottlenecks that continue to constrain hyperscale expansion. Platform Overview: Aligned’s Evolution and Strategic Fit Aligned Data Centers has rapidly emerged as a scale developer and operator purpose-built for high-density, quick-turn capacity demanded by hyperscalers and AI platforms. Beyond the U.S., Aligned extended its reach across the Americas through its acquisition of ODATA in Latin America, creating a Pan-American presence that now spans more than 50 campuses and over 5 GW of capacity. The company has repeatedly accessed both public and private capital markets, most recently securing more than $12 billion in new equity and debt financing to accelerate expansion. Aligned’s U.S.–LATAM footprint provides geographic diversification and proximity to fast-growing AI regions. The buyer consortium’s global relationships – spanning utilities, OEMs, and sovereign-fund partners – help address power, interconnect, and supply-chain constraints, all of which are critical to sustaining growth in the AI data-center ecosystem. Macquarie Asset Management built Aligned from a niche U.S. operator into a 5 GW-plus, multi-market platform, the kind of asset infrastructure investors covet as AI demand outpaces grid and supply-chain capacity. Its sale at this stage reflects a broader wave of industry consolidation among large-scale digital-infrastructure owners. Since its own acquisition by BlackRock in early 2024, GIP has strengthened its position as one of the world’s top owners

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »