This Is How LLMs Break Down the Language

Stay Ahead, Stay ONMINE

This Is How LLMs Break Down the Language

Do you remember the hype when OpenAI released GPT-3 in 2020? Though not the first in its series, GPT-3 gained widespread popularity due to its impressive text generation capabilities. Since then, a diverse group of Large Language Models(Llms) have flooded the AI landscape. The golden question is: Have you ever wondered how ChatGPT or any other LLMs break down the language? If you haven’t yet, we are going to discuss the mechanism by which LLMs process the textual input given to them during training and inference. In principle, we call it tokenization. This article is inspired by the YouTube video titled Deep Dive into LLMs like ChatGPT from former Senior Director of AI at Tesla, Andrej Karpathy. His general audience video series is highly recommended for those who want to take a deep dive into the intricacies behind LLMs. Before diving into the main topic, I need you to have an understanding of the inner workings of a LLM. In the next section, I’ll break down the internals of a language model and its underlying architecture. If you’re already familiar with neural networks and LLMs in general, you can skip the next section without affecting your reading experience. Internals of large language models LLMs are made up of transformer neural networks. Consider neural networks as giant mathematical expressions. Inputs to neural networks are a sequence of tokens that are typically processed through embedding layers, which convert the tokens into numerical representations. For now, think of tokens as basic units of input data, such as words, phrases, or characters. In the next section, we’ll explore how to create tokens from input text data in depth. When we feed these inputs to the network, they are mixed into a giant mathematical expression along with the parameters or weights of these neural networks. Modern neural networks have billions of parameters. At the beginning, these parameters or weights are set randomly. Therefore, the neural network randomly guesses its predictions. During the training process, we iteratively update these weights so that the outputs of our neural network become consistent with the patterns observed in our training set. In a sense, neural network training is about finding the right set of weights that seem to be consistent with the statistics of the training set. The transformer architecture was introduced in the paper titled “Attention is All You Need” by Vaswani et al. in 2017. This is a neural network with a special kind of structure designed for sequence processing. Initially intended for Neural Machine Translation, it has since become the founding building block for LLMs. To get a sense of what production grade transformer neural networks look like visit https://bbycroft.net/llm. This site provides interactive 3D visualizations of generative pre-trained transformer (GPT) architectures and guides you through their inference process. Visualization of Nano-GPT at https://bbycroft.net/llm (Image by the author) This particular architecture, called Nano-GPT, has around 85,584 parameters. We feed the inputs, which are token sequences, at the top of the network. Information then flows through the layers of the network, where the input undergoes a series of transformations, including attention mechanisms and feed-forward networks, to produce an output. The output is the model’s prediction for the next token in the sequence. Tokenization Training a state-of-the-art language model like ChatGPT or Claude involves several stages arranged sequentially. In my previous article about hallucinations, I briefly explained the training pipeline for an LLM. If you want to learn more about training stages and hallucinations, you can read it here. Now, imagine we’re at the initial stage of training called pretraining. This stage requires a large, high-quality, web-scale dataset of terabyte size. The datasets used by major LLM providers are not publicly available. Therefore, we will look into an open-source dataset curated by Hugging Face, called FineWeb distributed under the Open Data Commons Attribution License. You can read more about how they collected and created this dataset here. FineWeb dataset curated by Hugging Face (Image by the author) I downloaded a sample from the FineWeb dataset, selected the first 100 examples, and concatenated them into a single text file. This is just raw internet text with various patterns within it. Sampled text from the FineWeb dataset (Image by the author) So our goal is to feed this data to the transformer neural network so that the model learns the flow of this text. We need to train our neural network to mimic the text. Before plugging this text into the neural network, we must decide how to represent it. Neural networks expect a one-dimensional sequence of symbols. That requires a finite set of possible symbols. Therefore, we must determine what these symbols are and how to represent our data as a one-dimensional sequence of them. What we have at this point is a one-dimensional sequence of text. There is an underlined representation of a sequence of raw bits for this text. We can encode the original sequence of text with UTF-8 encoding to get the sequence of raw bits. If you check the image below, you can see that the first 8 bits of the raw bit sequence correspond to the first letter ‘A’ of the original one-dimensional text sequence. Sampled text, represented as a one-dimensional sequence of bits (Image by the author) Now, we have a very long sequence with two symbols: zero and one. This is, in fact, what we were looking for — a one-dimensional sequence of symbols with a finite set of possible symbols. Now the problem is that sequence length is a precious resource in a neural network primarily because of computational efficiency, memory constraints, and the difficulty of processing long dependencies. Therefore, we don’t want extremely long sequences of just two symbols. We prefer shorter sequences of more symbols. So, we are going to trade off the number of symbols in our vocabulary against the resulting sequence length. As we need to further compress or shorten our sequence, we can group every 8 consecutive bits into a single byte. Since each bit is either 0 or 1, there are exactly 256 possible combinations of 8-bit sequences. Thus, we can represent this sequence as a sequence of bytes instead. Grouping bits to bytes (Image by the author) This representation reduces the length by a factor of 8, while expanding the symbol set to 256 possibilities. Consequently, each value in the sequence will fall within the range of 0 to 255. Sampled text, represented as a one-dimensional sequence of bytes (Image by the author) These numbers do not have any value in a numerical sense. They are just placeholders for unique identifiers or symbols. In fact, we could replace each of these numbers with a unique emoji and the core idea would still stand. Think of this as a sequence of emojis, each chosen from 256 unique options. Sampled text, represented as a one-dimensional sequence of emojis (Image by the author) This process of converting from raw text into symbols is called Tokenization. Tokenization in state-of-the-art language models goes even beyond this. We can further compress the length of the sequence in return for more symbols in our vocabulary using the Byte-Pair Encoding (BPE) algorithm. Initially developed for text compression, BPE is now widely used by transformer models for tokenization. OpenAI’s GPT series uses standard and customized versions of the BPE algorithm. Essentially, byte pair encoding involves identifying frequent consecutive bytes or symbols. For example, we can look into our byte level sequence of text. Sequence 101, followed by 114, is quite frequent (Image by the author) As you can see, the sequence 101 followed by 114 appears frequently. Therefore, we can replace this pair with a new symbol and assign it a unique identifier. We are going to rewrite every occurrence of 101 114 using this new symbol. This process can be repeated multiple times, with each iteration further shortening the sequence length while introducing additional symbols, thereby increasing the vocabulary size. Using this process, GPT-4 has come up with a token vocabulary of around 100,000. We can further explore tokenization using Tiktokenizer. Tiktokenizer provides an interactive web-based graphical user interface where you can input text and see how it’s tokenized according to different models. Play with this tool to get an intuitive understanding of what these tokens look like. For example, we can take the first four sentences of the text sequence and input them into the Tiktokenizer. From the dropdown menu, select the GPT-4 base model encoder: cl100k_base. Tiktokenizer (Image by the author) The colored text shows how the chunks of text correspond to the symbols. The following text, which is a sequence of length 51, is what GPT-4 will see at the end of the day. 11787, 499, 21815, 369, 90250, 763, 14689, 30, 7694, 1555, 279, 21542, 3770, 323, 499, 1253, 1120, 1518, 701, 4832, 2457, 13, 9359, 1124, 323, 6642, 264, 3449, 709, 3010, 18396, 13, 1226, 617, 9214, 315, 1023, 3697, 430, 1120, 649, 10379, 83, 3868, 311, 3449, 18570, 1120, 1093, 499, 0 We can now take our entire sample dataset and re-represent it as a sequence of tokens using the GPT-4 base model tokenizer, cl100k_base. Note that the original FineWeb dataset consists of a 15-trillion-token sequence, while our sample dataset contains only a few thousand tokens from the original dataset. Sampled text, represented as a one-dimensional sequence of tokens (Image by the author) Conclusion Tokenization is a fundamental step in how LLMs process text, transforming raw text data into a structured format before being fed into neural networks. As neural networks require a one-dimensional sequence of symbols, we need to achieve a balance between sequence length and the number of symbols in the vocabulary, optimizing for efficient computation. Modern state-of-the-art transformer-based LLMs, including GPT and GPT-2, use Byte-Pair Encoding tokenization. Breaking down tokenization helps demystify how LLMs interpret text inputs and generate coherent responses. Having an intuitive sense of what tokenization looks like helps in understanding the internal mechanisms behind the training and inference of LLMs. As LLMs are increasingly used as a knowledge base, a well-designed tokenization strategy is crucial for improving model efficiency and overall performance. If you enjoyed this article, connect with me on X (formerly Twitter) for more insights. References

This article is inspired by the YouTube video titled Deep Dive into LLMs like ChatGPT from former Senior Director of AI at Tesla, Andrej Karpathy. His general audience video series is highly recommended for those who want to take a deep dive into the intricacies behind LLMs.

Before diving into the main topic, I need you to have an understanding of the inner workings of a LLM. In the next section, I’ll break down the internals of a language model and its underlying architecture. If you’re already familiar with neural networks and LLMs in general, you can skip the next section without affecting your reading experience.

Internals of large language models

LLMs are made up of transformer neural networks. Consider neural networks as giant mathematical expressions. Inputs to neural networks are a sequence of tokens that are typically processed through embedding layers, which convert the tokens into numerical representations. For now, think of tokens as basic units of input data, such as words, phrases, or characters. In the next section, we’ll explore how to create tokens from input text data in depth. When we feed these inputs to the network, they are mixed into a giant mathematical expression along with the parameters or weights of these neural networks.

Modern neural networks have billions of parameters. At the beginning, these parameters or weights are set randomly. Therefore, the neural network randomly guesses its predictions. During the training process, we iteratively update these weights so that the outputs of our neural network become consistent with the patterns observed in our training set. In a sense, neural network training is about finding the right set of weights that seem to be consistent with the statistics of the training set.

The transformer architecture was introduced in the paper titled “Attention is All You Need” by Vaswani et al. in 2017. This is a neural network with a special kind of structure designed for sequence processing. Initially intended for Neural Machine Translation, it has since become the founding building block for LLMs.

To get a sense of what production grade transformer neural networks look like visit https://bbycroft.net/llm. This site provides interactive 3D visualizations of generative pre-trained transformer (GPT) architectures and guides you through their inference process.

Visualization of Nano-GPT at https://bbycroft.net/llm (Image by the author)

This particular architecture, called Nano-GPT, has around 85,584 parameters. We feed the inputs, which are token sequences, at the top of the network. Information then flows through the layers of the network, where the input undergoes a series of transformations, including attention mechanisms and feed-forward networks, to produce an output. The output is the model’s prediction for the next token in the sequence.

Tokenization

Training a state-of-the-art language model like ChatGPT or Claude involves several stages arranged sequentially. In my previous article about hallucinations, I briefly explained the training pipeline for an LLM. If you want to learn more about training stages and hallucinations, you can read it here.

Now, imagine we’re at the initial stage of training called pretraining. This stage requires a large, high-quality, web-scale dataset of terabyte size. The datasets used by major LLM providers are not publicly available. Therefore, we will look into an open-source dataset curated by Hugging Face, called FineWeb distributed under the Open Data Commons Attribution License. You can read more about how they collected and created this dataset here.

FineWeb dataset curated by Hugging Face (Image by the author)

I downloaded a sample from the FineWeb dataset, selected the first 100 examples, and concatenated them into a single text file. This is just raw internet text with various patterns within it.

Sampled text from the FineWeb dataset (Image by the author)

So our goal is to feed this data to the transformer neural network so that the model learns the flow of this text. We need to train our neural network to mimic the text. Before plugging this text into the neural network, we must decide how to represent it. Neural networks expect a one-dimensional sequence of symbols. That requires a finite set of possible symbols. Therefore, we must determine what these symbols are and how to represent our data as a one-dimensional sequence of them.

What we have at this point is a one-dimensional sequence of text. There is an underlined representation of a sequence of raw bits for this text. We can encode the original sequence of text with UTF-8 encoding to get the sequence of raw bits. If you check the image below, you can see that the first 8 bits of the raw bit sequence correspond to the first letter ‘A’ of the original one-dimensional text sequence.

Sampled text, represented as a one-dimensional sequence of bits (Image by the author)

Now, we have a very long sequence with two symbols: zero and one. This is, in fact, what we were looking for — a one-dimensional sequence of symbols with a finite set of possible symbols. Now the problem is that sequence length is a precious resource in a neural network primarily because of computational efficiency, memory constraints, and the difficulty of processing long dependencies. Therefore, we don’t want extremely long sequences of just two symbols. We prefer shorter sequences of more symbols. So, we are going to trade off the number of symbols in our vocabulary against the resulting sequence length.

As we need to further compress or shorten our sequence, we can group every 8 consecutive bits into a single byte. Since each bit is either 0 or 1, there are exactly 256 possible combinations of 8-bit sequences. Thus, we can represent this sequence as a sequence of bytes instead.

Grouping bits to bytes (Image by the author)

This representation reduces the length by a factor of 8, while expanding the symbol set to 256 possibilities. Consequently, each value in the sequence will fall within the range of 0 to 255.

Sampled text, represented as a one-dimensional sequence of bytes (Image by the author)

These numbers do not have any value in a numerical sense. They are just placeholders for unique identifiers or symbols. In fact, we could replace each of these numbers with a unique emoji and the core idea would still stand. Think of this as a sequence of emojis, each chosen from 256 unique options.

Sampled text, represented as a one-dimensional sequence of emojis (Image by the author)

This process of converting from raw text into symbols is called Tokenization. Tokenization in state-of-the-art language models goes even beyond this. We can further compress the length of the sequence in return for more symbols in our vocabulary using the Byte-Pair Encoding (BPE) algorithm. Initially developed for text compression, BPE is now widely used by transformer models for tokenization. OpenAI’s GPT series uses standard and customized versions of the BPE algorithm.

Essentially, byte pair encoding involves identifying frequent consecutive bytes or symbols. For example, we can look into our byte level sequence of text.

Sequence 101, followed by 114, is quite frequent (Image by the author)

As you can see, the sequence 101 followed by 114 appears frequently. Therefore, we can replace this pair with a new symbol and assign it a unique identifier. We are going to rewrite every occurrence of 101 114 using this new symbol. This process can be repeated multiple times, with each iteration further shortening the sequence length while introducing additional symbols, thereby increasing the vocabulary size. Using this process, GPT-4 has come up with a token vocabulary of around 100,000.

We can further explore tokenization using Tiktokenizer. Tiktokenizer provides an interactive web-based graphical user interface where you can input text and see how it’s tokenized according to different models. Play with this tool to get an intuitive understanding of what these tokens look like.

For example, we can take the first four sentences of the text sequence and input them into the Tiktokenizer. From the dropdown menu, select the GPT-4 base model encoder: cl100k_base.

The colored text shows how the chunks of text correspond to the symbols. The following text, which is a sequence of length 51, is what GPT-4 will see at the end of the day.

11787, 499, 21815, 369, 90250, 763, 14689, 30, 7694, 1555, 279, 21542, 3770, 323, 499, 1253, 1120, 1518, 701, 4832, 2457, 13, 9359, 1124, 323, 6642, 264, 3449, 709, 3010, 18396, 13, 1226, 617, 9214, 315, 1023, 3697, 430, 1120, 649, 10379, 83, 3868, 311, 3449, 18570, 1120, 1093, 499, 0

We can now take our entire sample dataset and re-represent it as a sequence of tokens using the GPT-4 base model tokenizer, cl100k_base. Note that the original FineWeb dataset consists of a 15-trillion-token sequence, while our sample dataset contains only a few thousand tokens from the original dataset.

Sampled text, represented as a one-dimensional sequence of tokens (Image by the author)

Conclusion

Tokenization is a fundamental step in how LLMs process text, transforming raw text data into a structured format before being fed into neural networks. As neural networks require a one-dimensional sequence of symbols, we need to achieve a balance between sequence length and the number of symbols in the vocabulary, optimizing for efficient computation. Modern state-of-the-art transformer-based LLMs, including GPT and GPT-2, use Byte-Pair Encoding tokenization.

Breaking down tokenization helps demystify how LLMs interpret text inputs and generate coherent responses. Having an intuitive sense of what tokenization looks like helps in understanding the internal mechanisms behind the training and inference of LLMs. As LLMs are increasingly used as a knowledge base, a well-designed tokenization strategy is crucial for improving model efficiency and overall performance.

If you enjoyed this article, connect with me on X (formerly Twitter) for more insights.

References

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

AWS launches ‘Capabilities by Region’ to simplify planning for cloud deployments

While Microsoft Azure’s Product Availability by Region portal lists services by geography, it lacks forward-looking timelines and the unified API comparability that AWS’s Capabilities by Region delivers, Jain noted. Region Picker, which is a similar offering from Google, too, falls short on granular, future-facing service or API roadmaps and focuses

Google Cloud aims for more cost-effective Arm computing with Axion N4A

It’s not alone: AWS introduced its own Arm-based chip, Graviton, in 2018 to reduce the cost of running internal cloud workloads such as Amazon retail IT, and now 50% of new AWS instances run on it. Microsoft, too, recently developed an Arm chip, Cobalt, to run Microsoft 365 and to

Perplexity’s open-source tool to run trillion-parameter models without costly upgrades

The obvious answer would be Nvidia’s new GB200 systems, essentially one giant 72-GPU server. But those cost millions, face extreme supply shortages, and aren’t available everywhere, the researchers noted. Meanwhile, H100 and H200 systems are plentiful and relatively cheap. The catch: running large models across multiple older systems has traditionally

Arista shares Q3 financials, touts ‘golden era in networking’

“I think a lot of these designs will materialize as the standards for Ethernet are getting stronger and stronger. We now have a UEC [Ultra Ethernet Consortium] spec. You heard me talk about the Scale-Up Ethernet spec for ESUN where we can bring different work streams onto the same Ethernet

Aker BP, DNO Swap Norwegian Offshore Assets

DNO ASA and Aker BP ASA have signed agreements to recalibrate their exploration and production ownerships in the Norwegian continental shelf, including the Kjottkake discovery. Aker BP will assume operatorship of Kjottkake until the start of production, after which DNO will resume operatorship, according to online statements by the companies on Thursday. The move allows for faster development, with production now targeted for the first quarter of 2028, the companies said. Fornebu, Norway-based Aker BP “will draw on its alliance with suppliers and its equipment inventory to deliver the project on time”, DNO said. “Three years from discovery to production is a standout on the Norwegian continental shelf, where such tie-backs typically take at least twice as long to complete”, DNO said. DNO discovered Kjottkake earlier this year in the northern North Sea under Production License 1182 S (PL 1182 S). “The discovery was made in Paleocene injectite sandstones of excellent reservoir quality with preliminary estimates of gross recoverable resources in the range of 39 to 75 million barrels of oil equivalent (MMboe), with a mean of 55 MMboe”, DNO said in a press release March 26. “The Kjottkake exploration well encountered a 41-meter oil column and a nine-meter gas column. A sidetrack drilled horizontally 1,350 meters westwards along the reservoir in the Sotra Formation confirmed the presence of the oil column throughout the discovery”. Kjottkake sits 27 kilometers (16.78 miles) northwest of the Troll C platform and 44 kilometers southwest of the Gjoa platform, according to DNO. Under the swap agreements, DNO and Aker BP retain their Kjottkake stakes of 40 percent and 45 percent respectively. Concedo AS holds 15 percent. Also under the agreements DNO will transfer its 28.9 percent stake in the producing Vilje field to Aker BP, whose interest will increase to 75.76 percent. Aker BP

Oil Falls as Saudi Price Cuts Signal Market Gloom

Oil extended declines after Saudi Arabia lowered the prices of its crude, signaling uncertainty surrounding the supply outlook, while equities slumped in another pressure on the commodity. West Texas Intermediate settled near $59, sliding around 0.3% on the day after falling in the previous two sessions. Volatility hit Wall Street, also weighing on oil prices. Saudi Arabia lowered the price of its main oil grade to Asia for December to the lowest level in 11 months. Even though the price cut met expectations, traders saw it as a bearish signal about the cartel’s confidence in the market’s ability to absorb new supply, with a glut widely expected to begin next year. Prices have given up ground since the US sanctioned Russia’s two largest oil producers last month over Moscow’s war in Ukraine, and abundant supplies have so far managed to cushion the impact of stunted flows from the OPEC+ member to major buyers India and China. Key price gauges indicate that supply perceptions are worsening, with the premium that front-month WTI futures command over the next month’s contract, known as the prompt spread, narrowing in the past few weeks to near February lows. That’s also true for Brent crude. Still, US shale companies are forging ahead with their production plans, with Diamondback Energy Inc., Coterra Energy Inc. and Ovintiv Inc. this week announcing they inend to raise output slightly for this year or 2026 despite oil prices falling close to the threshold needed for many US shale wells to break even. Oversupply gloom hasn’t permeated refined products markets, though. Traders are still assessing how those supplies may be impacted by the US clampdown on purchases of Russian crude and Ukraine’s strikes on its neighbor’s energy assets. Those factors, as well as diminishing global refining capacity, bolstered diesel futures and gasoil

Clearway projects strong renewables outlook while adding gas assets

300 MW Average size of Clearway’s current projects. Most of the projects slated for 2030 and beyond are 500 MW or more, the company said. >90% Percentage of projects planned for 2031-2032 that are focused in the Western U.S or the PJM Interconnection, “where renewables are cost competitive and/or valued.” 1.8 GW Capacity of power purchase agreements meant to support data center loads the company has signed so far this year. Meeting digital infrastructure energy needs Independent power producer Clearway Energy announced a U.S. construction pipeline of 27 GW of generation and storage resources following strong third-quarter earnings. The San Francisco-based company is owned by Global Infrastructure Partners and TotalEnergies. According to its earnings presentation, it has an operating portfolio of more than 12 GW of wind, solar, gas and storage. Clearway President and CEO Craig Cornelius said on an earnings call Tuesday that the company is positioning itself to serve large load data centers. “Growth in both the medium and long term reflects the strong traction we’ve made in supporting the energy needs of our country’s digital infrastructure build-out and reindustrialization,” Cornelius said during the call. “We expect this to be a core driver of Clearway’s growth outlook well into the 2030s.” Cornelius noted that Clearway executed and awarded 1.8 GW of power purchase agreements meant to support data center loads so far this year, and is currently developing generation aimed at serving “gigawatt class co-located data centers across five states.” Its 27 GW pipeline of projects in development or under construction includes 8.2 GW of solar, 4.6 GW of wind, 1.3 GW of wind repowering, 8 GW of standalone storage, 2.1 GW of paired storage and 2.6 GW of natural gas aimed at serving data centers. Under the OBBBA, wind and solar projects that begin construction by July 4,

USA Crude Oil Stocks Rise More Than 5MM Barrels WoW

U.S. commercial crude oil inventories, excluding those in the Strategic Petroleum Reserve (SPR) increased by 5.2 million barrels from the week ending October 24 to the week ending October 31. That’s what the U.S. Energy Information Administration (EIA) highlighted in its latest weekly petroleum status report, which was released on November 5 and included data for the week ending October 31. The EIA report showed that crude oil stocks, not including the SPR, stood at 421.2 million barrels on October 31, 416.0 million barrels on October 24, and 427.7 million barrels on November 1, 2024. Crude oil in the SPR stood at 409.6 million barrels on October 31, 409.1 million barrels on October 24, and 387.2 million barrels on November 1, 2024, the report highlighted. Total petroleum stocks – including crude oil, total motor gasoline, fuel ethanol, kerosene type jet fuel, distillate fuel oil, residual fuel oil, propane/propylene, and other oils – stood at 1.679 billion barrels on October 31, the report revealed. Total petroleum stocks were up 1.1 million barrels week on week and up 44.5 million barrels year on year, the report showed. “At 421.2 million barrels, U.S. crude oil inventories are about four percent below the five year average for this time of year,” the EIA said in its latest weekly petroleum status report. “Total motor gasoline inventories decreased by 4.7 million barrels from last week and are about five percent below the five year average for this time of year. Both finished gasoline and blending components inventories decreased last week,” it added. “Distillate fuel inventories decreased by 0.6 million barrels last week and are about nine percent below the five year average for this time of year. Propane/propylene inventories increased by 0.4 million barrels from last week and are 15 percent above the five year average

Energy Transfer Bags 20-Year Deal to Deliver Gas for Entergy Louisiana

Energy Transfer LP has signed a 20-year transport agreement to deliver natural gas to Entergy Corp to support the power utility’s operations in Louisiana. “Under the agreement, Energy Transfer would initially provide 250,000 MMBtu per day of firm transportation service beginning in February 2028 and continuing through January 2048”, a joint statement said. “The deal structure also provides an option to Entergy to expand delivery capacity in the region to meet future energy demand and demonstrates both companies’ long-term commitment to meeting the region’s growing energy needs. “The natural gas supplied through this agreement, already in Entergy’s financial plan, will help fuel Entergy Louisiana’s combined-cycle combustion turbine facilities, which are being developed to provide efficient, cleaner energy for the company’s customers and to support projects like Meta’s new hyperscale data center in Richland Parish. “The project includes expanding Energy Transfer’s Tiger Pipeline with the construction of a 12-mile lateral with a capacity of up to one Bcfd. Natural gas supply for this project will be sourced from Energy Transfer’s extensive pipeline network which is connected to all the major producing basins in the U.S.” Entergy Louisiana had 1.1 million electric customers in 58 of Louisiana’s 64 parishes as of December 2024, Entergy Louisiana says on its website. Earlier Energy Transfer secured an agreement to deliver gas for a power-data center partnership between VoltaGrid LLC and Oracle Corp. VoltaGrid will deploy 2.3 gigawatts of “cutting-edge, ultra-low-emissions infrastructure, supplied by Energy Transfer’s pipeline network, to support the energy demands of Oracle Cloud Infrastructure’s (OCI) next-generation artificial intelligence data centers”, VoltaGrid said in a press release October 15. “The VoltaGrid power infrastructure will be delivered through the proprietary VoltaGrid platform – a modular, high-transient-response system developed by VoltaGrid with key suppliers, including INNIO Jenbacher and ABB”. “This power plant deployment is being supplied with firm natural gas from Energy Transfer’s expansive pipeline

SRP to Convert Unit 4 of SGS Station in Arizona to Use Gas

Three of the four units at the coal-fired Springerville Generation Station (SGS) in Arizona will shift to natural gas in the early 2030s. This week Salt River Project’s (SRP) board approved the conversion of Unit 4. Earlier this year Tucson Electric Power (TEP), which operates all four units, said it will convert Units 1 and 2. Unit 3, owned by the Tri-State Generation and Transmission Association, is set to be retired. “Today’s decision is the lowest-cost option to preserve the plant’s 400-megawatt (MW) generating capacity, enough to serve 90,000 homes, which is important to meeting the Valley’s growing power need in the early 2030s”, SRP said in a statement on its website. “Converting SGS Unit 4 to run on natural gas is expected to save SRP customers about $45 million compared to building a new natural gas facility and about $826 million relative to adding new long-duration lithium-ion batteries over the same period”, the public power utility added. “The decision also provides a bridge to the mid-2040s, when other generating technology options, including advanced nuclear, are mature”. Gas for the converted Unit 4 will come from a new pipeline that SRP will build. The pipeline will also supply gas to the Coronado Generating Station (CGS). On June 24 SRP announced board approval for the conversion of CGS from coal to gas. “SRP is working to more than double the capacity of its power system in the next 10 years while maintaining reliability and affordability and making continued progress toward our sustainability goals”, SRP said. “SRP will accomplish this through an all-of-the-above approach that plans to add renewables in addition to natural gas and storage resources”. SRP currently supplies 3,000 MW of “carbon-free energy” including over 1,500 MW of solar, with nearly 1,300 MW of battery and pumped hydro storage supporting its

Top network and data center events 2025 & 2026

Denise Dubie is a senior editor at Network World with nearly 30 years of experience writing about the tech industry. Her coverage areas include AIOps, cybersecurity, networking careers, network management, observability, SASE, SD-WAN, and how AI transforms enterprise IT. A seasoned journalist and content creator, Denise writes breaking news and in-depth features, and she delivers practical advice for IT professionals while making complex technology accessible to all. Before returning to journalism, she held senior content marketing roles at CA Technologies, Berkshire Grey, and Cisco. Denise is a trusted voice in the world of enterprise IT and networking.

Google’s cheaper, faster TPUs are here, while users of other AI processors face a supply crunch

Opportunities for the AI industry LLM vendors such as OpenAI and Anthropic, which still have relatively young code bases and are continuously evolving them, also have much to gain from the arrival of Ironwood for training their models, said Forrester vice president and principal analyst Charlie Dai. In fact, Anthropic has already agreed to procure 1 million TPUs for training and its models and using them for inferencing. Other, smaller vendors using Google’s TPUs for training models include Lightricks and Essential AI. Google has seen a steady increase in demand for its TPUs (which it also uses to run interna services), and is expected to buy $9.8 billion worth of TPUs from Broadcom this year, compared to $6.2 billion and $2.04 billion in 2024 and 2023 respectively, according to Harrowell. “This makes them the second-biggest AI chip program for cloud and enterprise data centers, just tailing Nvidia, with approximately 5% of the market. Nvidia owns about 78% of the market,” Harrowell said. The legacy problem While some analysts were optimistic about the prospects for TPUs in the enterprise, IDC research director Brandon Hoff said enterprises will most likely to stay away from Ironwood or TPUs in general because of their existing code base written for other platforms. “For enterprise customers who are writing their own inferencing, they will be tied into Nvidia’s software platform,” Hoff said, referring to CUDA, the software platform that runs on Nvidia GPUs. CUDA was released to the public in 2007, while the first version of TensorFlow has only been around since 2015.

Cisco launches AI infrastructure, AI practitioner certifications

“This new certification focuses on artificial intelligence and machine learning workloads, helping technical professionals become AI-ready and successfully embed AI into their workflows,” said Pat Merat, vice president at Learn with Cisco, in a blog detailing the new AI Infrastructure Specialist certification. “The certification validates a candidate’s comprehensive knowledge in designing, implementing, operating, and troubleshooting AI solutions across Cisco infrastructure.” Separately, the AITECH certification is part of the Cisco AI Infrastructure track, which complements its existing networking, data center, and security certifications. Cisco says the AITECH cert training is intended for network engineers, system administrators, solution architects, and other IT professionals who want to learn how AI impacts enterprise infrastructure. The training curriculum covers topics such as: Utilizing AI for code generation, refactoring, and using modern AI-assisted coding workflows. Using generative AI for exploratory data analysis, data cleaning, transformation, and generating actionable insights. Designing and implementing multi-step AI-assisted workflows and understanding complex agentic systems for automation. Learning AI-powered requirements, evaluating customization approaches, considering deployment strategies, and designing robust AI workflows. Evaluating, fine-tuning, and deploying pre-trained AI models, and implementing Retrieval Augmented Generation (RAG) systems. Monitoring, maintaining, and optimizing AI-powered workflows, ensuring data integrity and security. AITECH certification candidates will learn how to use AI to enhance productivity, automate routine tasks, and support the development of new applications. The training program includes hands-on labs and simulations to demonstrate practical use cases for AI within Cisco and multi-vendor environments.

Chip-to-Grid Gets Bought: Eaton, Vertiv, and Daikin Deals Imply a New Thermal Capital Cycle

This week delivered three telling acquisitions that mark a turning point for the global data center supply chain; and more specifically, for the high-density liquid cooling mega-play now unfolding across the power-thermal continuum. Eaton is acquiring Boyd Thermal for $9.5 billion from Goldman Sachs Asset Management. Vertiv is buying PurgeRite for about $1 billion from Milton Street Capital. And Daikin Applied has moved to acquire Chilldyne, one of the most proven negative-pressure direct-to-chip pioneers. On paper, they’re three distinct transactions. In reality, they’re chapters in the same story: the acceleration of strategic vertical integration around thermal infrastructure for AI-class compute. The Equity Layer: Private Capital Builds, Strategics Buy From an equity standpoint, these are classic handoff moments between private-equity construction and corporate consolidation. Goldman Sachs built Boyd Thermal into a global platform spanning cold plates, CDUs, and high-density liquid loop design, now sold to Eaton at an enterprise multiple north of 5× 2026E revenue. Milton Street Capital took PurgeRite from a specialist contractor in fluid flushing and commissioning into a nationwide services platform. And Daikin, long synonymous with chillers and air-side thermal, is crossing the liquid Rubicon by buying its way into the D2C ecosystem. Each deal crystallizes a simple fact: liquid cooling is no longer an adjunct; it’s core infrastructure. Private equity did its job scaling the parts. Strategic players are now paying up for the system. Eaton’s Bid: The Chip-to-Grid Thesis For Eaton, Boyd Thermal is the final missing piece in its “chip-to-grid” thesis. The company already owns the electrical side of the data center: UPS, busway, switchgear, and monitoring. Boyd plugs the thermal gap, allowing Eaton to market full rack-to-substation solutions for AI loads in the 50–100 kW+ range. It’s a statement acquisition that places Eaton squarely against Schneider Electric, Vertiv and ABB in the race to

Space: The final frontier for data processing

There are, however, a couple of reasons why data centers in space are being considered. There are plenty of reports about how the increased amount of AI processing is affecting power consumption within data centers; the World Economic Forum has estimated that the power required to handle AI is increasing at a rate of between 26% and 36% annually. Therefore, it is not surprising that organizations are looking at other options. But an even more pressing reason for orbiting data centers is to handle the amount of data that is being produced by existing satellites, Judge said. “Essentially, satellites are gathering a lot more data than can be sent to earth, because downlinks are a bottleneck,” he noted. “With AI capacity in orbit, they could potentially analyze more of this data, extract more useful information, and send insights back to earth. My overall feeling is that any more data processing in space is going to be driven by space processing needs.” And China may already be ahead of the game. Last year, Guoxing Aerospace launched 12 satellites, forming a space-based computing network dubbed the Three-Body Computing Constellation. When completed, it will contain 2,800 satellites, all handling the orchestration and processing of data, taking edge computing to a new dimension.

Meta’s $27B Hyperion Campus: A New Blueprint for AI Infrastructure Finance

At the end of October, Meta announced a joint venture with funds managed by Blue Owl Capital to finance, develop, and operate the previously announced “Hyperion” project, a multi-building AI megacampus in Richland Parish, Louisiana. Under the new JV structure, Blue Owl will own 80 percent and Meta 20 percent, though Meta had announced the project long before Blue Owl’s involvement was confirmed. The venture anticipates roughly $27 billion in total development costs for the buildings and the long-lived power, cooling, and connectivity infrastructure. Blue Owl contributed about $7 billion in cash at formation; Meta received a $3 billion one-time distribution and contributed land and construction-in-progress to the vehicle. Rachel Peterson, VP of Data Centers at Meta, noted that construction on the project is already well underway, with thousands of workers on-site. Structuring Capital and Control Media coverage from Reuters and others characterizes the financing package as one of the largest private-capital deals ever for a single industrial campus, with debt placements led by PIMCO and additional institutional investors. Meta keeps the project largely off its balance sheet through the joint venture while retaining the development and property-management role and serving as the anchor tenant for the campus. The JV allows Meta to smooth its capital expenditures and manage risk while maintaining execution control over its most ambitious AI site to date. The structure incorporates lease agreements and a residual-value guarantee, according to Kirkland & Ellis (Blue Owl’s counsel), enabling lenders and equity holders to underwrite a very large, long-duration asset with multiple exit paths. For Blue Owl, Hyperion represents a utility-like digital-infrastructure platform with contracted cash flows to a single A-tier counterparty: a hyperscaler running mission-critical AI workloads for training and inference. As Barron’s and MarketWatch have noted, the deal underscores Wall Street’s ongoing appetite for AI-infrastructure investments at

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE