LLM + RAG: Creating an AI-Powered File Reader Assistant

Stay Ahead, Stay ONMINE

LLM + RAG: Creating an AI-Powered File Reader Assistant

Introduction AI is everywhere. It is hard not to interact at least once a day with a Large Language Model (LLM). The chatbots are here to stay. They’re in your apps, they help you write better, they compose emails, they read emails…well, they do a lot. And I don’t think that that is bad. In fact, my opinion is the other way – at least so far. I defend and advocate for the use of AI in our daily lives because, let’s agree, it makes everything much easier. I don’t have to spend time double-reading a document to find punctuation problems or type. AI does that for me. I don’t waste time writing that follow-up email every single Monday. AI does that for me. I don’t need to read a huge and boring contract when I have an AI to summarize the main takeaways and action points to me! These are only some of AI’s great uses. If you’d like to know more use cases of LLMs to make our lives easier, I wrote a whole book about them. Now, thinking as a data scientist and looking at the technical side, not everything is that bright and shiny. LLMs are great for several general use cases that apply to anyone or any company. For example, coding, summarizing, or answering questions about general content created until the training cutoff date. However, when it comes to specific business applications, for a single purpose, or something new that didn’t make the cutoff date, that is when the models won’t be that useful if used out-of-the-box – meaning, they will not know the answer. Thus, it will need adjustments. Training an LLM model can take months and millions of dollars. What is even worse is that if we don’t adjust and tune the model to our purpose, there will be unsatisfactory results or hallucinations (when the model’s response doesn’t make sense given our query). So what is the solution, then? Spending a lot of money retraining the model to include our data? Not really. That’s when the Retrieval-Augmented Generation (RAG) becomes useful. RAG is a framework that combines getting information from an external knowledge base with large language models (LLMs). It helps AI models produce more accurate and relevant responses. Let’s learn more about RAG next. What is RAG? Let me tell you a story to illustrate the concept. I love movies. For some time in the past, I knew which movies were competing for the best movie category at the Oscars or the best actors and actresses. And I would certainly know which ones got the statue for that year. But now I am all rusty on that subject. If you asked me who was competing, I would not know. And even if I tried to answer you, I would give you a weak response. So, to provide you with a quality response, I will do what everybody else does: search for the information online, obtain it, and then give it to you. What I just did is the same idea as the RAG: I obtained data from an external database to give you an answer. When we enhance the LLM with a content store where it can go and retrieve data to augment (increase) its knowledge base, that is the RAG framework in action. RAG is like creating a content store where the model can enhance its knowledge and respond more accurately. User prompt about Content C. LLM retrieves external content to aggregate to the answer. Image by the author. Summarizing: Uses search algorithms to query external data sources, such as databases, knowledge bases, and web pages. Pre-processes the retrieved information. Incorporates the pre-processed information into the LLM. Why use RAG? Now that we know what the RAG framework is let’s understand why we should be using it. Here are some of the benefits: Enhances factual accuracy by referencing real data. RAG can help LLMs process and consolidate knowledge to create more relevant answers RAG can help LLMs access additional knowledge bases, such as internal organizational data RAG can help LLMs create more accurate domain-specific content RAG can help reduce knowledge gaps and AI hallucination As previously explained, I like to say that with the RAG framework, we are giving an internal search engine for the content we want it to add to the knowledge base. Well. All of that is very interesting. But let’s see an application of RAG. We will learn how to create an AI-powered PDF Reader Assistant. Project This is an application that allows users to upload a PDF document and ask questions about its content using AI-powered natural language processing (NLP) tools. The app uses Streamlit as the front end. Langchain, OpenAI’s GPT-4 model, and FAISS (Facebook AI Similarity Search) for document retrieval and question answering in the backend. Let’s break down the steps for better understanding: Loading a PDF file and splitting it into chunks of text. This makes the data optimized for retrieval Present the chunks to an embedding tool. Embeddings are numerical vector representations of data used to capture relationships, similarities, and meanings in a way that machines can understand. They are widely used in Natural Language Processing (NLP), recommender systems, and search engines. Next, we put those chunks of text and embeddings in the same DB for retrieval. Finally, we make it available to the LLM. Data preparation Preparing a content store for the LLM will take some steps, as we just saw. So, let’s start by creating a function that can load a file and split it into text chunks for efficient retrieval. # Imports from langchain_community.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter def load_document(pdf): # Load a PDF “”” Load a PDF and split it into chunks for efficient retrieval. :param pdf: PDF file to load :return: List of chunks of text “”” loader = PyPDFLoader(pdf) docs = loader.load() # Instantiate Text Splitter with Chunk Size of 500 words and Overlap of 100 words so that context is not lost text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100) # Split into chunks for efficient retrieval chunks = text_splitter.split_documents(docs) # Return return chunks Next, we will start building our Streamlit app, and we’ll use that function in the next script. Web application We will begin importing the necessary modules in Python. Most of those will come from the langchain packages. FAISS is used for document retrieval; OpenAIEmbeddings transforms the text chunks into numerical scores for better similarity calculation by the LLM; ChatOpenAI is what enables us to interact with the OpenAI API; create_retrieval_chain is what actually the RAG does, retrieving and augmenting the LLM with that data; create_stuff_documents_chain glues the model and the ChatPromptTemplate. Note: You will need to generate an OpenAI Key to be able to run this script. If it’s the first time you’re creating your account, you get some free credits. But if you have it for some time, it is possible that you will have to add 5 dollars in credits to be able to access OpenAI’s API. An option is using Hugging Face’s Embedding. # Imports from langchain_community.vectorstores import FAISS from langchain_openai import OpenAIEmbeddings from langchain.chains import create_retrieval_chain from langchain_openai import ChatOpenAI from langchain.chains.combine_documents import create_stuff_documents_chain from langchain_core.prompts import ChatPromptTemplate from scripts.secret import OPENAI_KEY from scripts.document_loader import load_document import streamlit as st This first code snippet will create the App title, create a box for file upload, and prepare the file to be added to the load_document() function. # Create a Streamlit app st.title(“AI-Powered Document Q&A”) # Load document to streamlit uploaded_file = st.file_uploader(“Upload a PDF file”, type=”pdf”) # If a file is uploaded, create the TextSplitter and vector database if uploaded_file : # Code to work around document loader from Streamlit and make it readable by langchain temp_file = “./temp.pdf” with open(temp_file, “wb”) as file: file.write(uploaded_file.getvalue()) file_name = uploaded_file.name # Load document and split it into chunks for efficient retrieval. chunks = load_document(temp_file) # Message user that document is being processed with time emoji st.write(“Processing document… :watch:”) Machines understand numbers better than text, so in the end, we will have to provide the model with a database of numbers that it can compare and check for similarity when performing a query. That’s where the embeddings will be useful to create the vector_db, in this next piece of code. # Generate embeddings # Embeddings are numerical vector representations of data, typically used to capture relationships, similarities, # and meanings in a way that machines can understand. They are widely used in Natural Language Processing (NLP), # recommender systems, and search engines. embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_KEY, model=”text-embedding-ada-002″) # Can also use HuggingFaceEmbeddings # from langchain_huggingface.embeddings import HuggingFaceEmbeddings # embeddings = HuggingFaceEmbeddings(model_name=”sentence-transformers/all-MiniLM-L6-v2″) # Create vector database containing chunks and embeddings vector_db = FAISS.from_documents(chunks, embeddings) Next, we create a retriever object to navigate in the vector_db. # Create a document retriever retriever = vector_db.as_retriever() llm = ChatOpenAI(model_name=”gpt-4o-mini”, openai_api_key=OPENAI_KEY) Then, we will create the system_prompt, which is a set of instructions to the LLM on how to answer, and we will create a prompt template, preparing it to be added to the model once we get the input from the user. # Create a system prompt # It sets the overall context for the model. # It influences tone, style, and focus before user interaction starts. # Unlike user inputs, a system prompt is not visible to the end user. system_prompt = ( “You are a helpful assistant. Use the given context to answer the question.” “If you don’t know the answer, say you don’t know. ” “{context}” ) # Create a prompt Template prompt = ChatPromptTemplate.from_messages( [ (“system”, system_prompt), (“human”, “{input}”), ] ) # Create a chain # It creates a StuffDocumentsChain, which takes multiple documents (text data) and “stuffs” them together before passing them to the LLM for processing. question_answer_chain = create_stuff_documents_chain(llm, prompt) Moving on, we create the core of the RAG framework, pasting together the retriever object and the prompt. This object adds relevant documents from a data source (e.g., a vector database) and makes it ready to be processed using an LLM to generate a response. # Creates the RAG chain = create_retrieval_chain(retriever, question_answer_chain) Finally, we create the variable question for the user input. If this question box is filled with a query, we pass it to the chain, which calls the LLM to process and return the response, which will be printed on the app’s screen. # Streamlit input for question question = st.text_input(“Ask a question about the document:”) if question: # Answer response = chain.invoke({“input”: question})[‘answer’] st.write(response) Here is a screenshot of the result. Screenshot of the final app. Image by the author. And this is a GIF for you to see the File Reader Ai Assistant in action! File Reader AI Assistant in action. Image by the author. Before you go In this project, we learned what the RAG framework is and how it helps the Llm to perform better and also perform well with specific knowledge. AI can be powered with knowledge from an instruction manual, databases from a company, some finance files, or contracts, and then become fine-tuned to respond accurately to domain-specific content queries. The knowledge base is augmented with a content store. To recap, this is how the framework works: 1️⃣ User Query → Input text is received. 2️⃣ Retrieve Relevant Documents → Searches a knowledge base (e.g., a database, vector store). 3️⃣ Augment Context → Retrieved documents are added to the input. 4️⃣ Generate Response → An LLM processes the combined input and produces an answer. GitHub repository https://github.com/gurezende/Basic-Rag About me If you liked this content and want to learn more about my work, here is my website, where you can also find all my contacts. https://gustavorsantos.me References https://cloud.google.com/use-cases/retrieval-augmented-generation https://www.ibm.com/think/topics/retrieval-augmented-generation https://python.langchain.com/docs/introduction https://www.geeksforgeeks.org/how-to-get-your-own-openai-api-key

Introduction

AI is everywhere.

It is hard not to interact at least once a day with a Large Language Model (LLM). The chatbots are here to stay. They’re in your apps, they help you write better, they compose emails, they read emails…well, they do a lot.

And I don’t think that that is bad. In fact, my opinion is the other way – at least so far. I defend and advocate for the use of AI in our daily lives because, let’s agree, it makes everything much easier.

I don’t have to spend time double-reading a document to find punctuation problems or type. AI does that for me. I don’t waste time writing that follow-up email every single Monday. AI does that for me. I don’t need to read a huge and boring contract when I have an AI to summarize the main takeaways and action points to me!

These are only some of AI’s great uses. If you’d like to know more use cases of LLMs to make our lives easier, I wrote a whole book about them.

Now, thinking as a data scientist and looking at the technical side, not everything is that bright and shiny.

LLMs are great for several general use cases that apply to anyone or any company. For example, coding, summarizing, or answering questions about general content created until the training cutoff date. However, when it comes to specific business applications, for a single purpose, or something new that didn’t make the cutoff date, that is when the models won’t be that useful if used out-of-the-box – meaning, they will not know the answer. Thus, it will need adjustments.

Training an LLM model can take months and millions of dollars. What is even worse is that if we don’t adjust and tune the model to our purpose, there will be unsatisfactory results or hallucinations (when the model’s response doesn’t make sense given our query).

So what is the solution, then? Spending a lot of money retraining the model to include our data?

Not really. That’s when the Retrieval-Augmented Generation (RAG) becomes useful.

RAG is a framework that combines getting information from an external knowledge base with large language models (LLMs). It helps AI models produce more accurate and relevant responses.

Let’s learn more about RAG next.

What is RAG?

Let me tell you a story to illustrate the concept.

I love movies. For some time in the past, I knew which movies were competing for the best movie category at the Oscars or the best actors and actresses. And I would certainly know which ones got the statue for that year. But now I am all rusty on that subject. If you asked me who was competing, I would not know. And even if I tried to answer you, I would give you a weak response.

So, to provide you with a quality response, I will do what everybody else does: search for the information online, obtain it, and then give it to you. What I just did is the same idea as the RAG: I obtained data from an external database to give you an answer.

When we enhance the LLM with a content store where it can go and retrieve data to augment (increase) its knowledge base, that is the RAG framework in action.

RAG is like creating a content store where the model can enhance its knowledge and respond more accurately.

Diagram: User prompts and content using LLM + RAG — User prompt about Content C. LLM retrieves external content to aggregate to the answer. Image by the author.

Summarizing:

Uses search algorithms to query external data sources, such as databases, knowledge bases, and web pages.
Pre-processes the retrieved information.
Incorporates the pre-processed information into the LLM.

Why use RAG?

Now that we know what the RAG framework is let’s understand why we should be using it.

Here are some of the benefits:

Enhances factual accuracy by referencing real data.
RAG can help LLMs process and consolidate knowledge to create more relevant answers
RAG can help LLMs access additional knowledge bases, such as internal organizational data
RAG can help LLMs create more accurate domain-specific content
RAG can help reduce knowledge gaps and AI hallucination

As previously explained, I like to say that with the RAG framework, we are giving an internal search engine for the content we want it to add to the knowledge base.

Well. All of that is very interesting. But let’s see an application of RAG. We will learn how to create an AI-powered PDF Reader Assistant.

Project

This is an application that allows users to upload a PDF document and ask questions about its content using AI-powered natural language processing (NLP) tools.

The app uses Streamlit as the front end.
Langchain, OpenAI’s GPT-4 model, and FAISS (Facebook AI Similarity Search) for document retrieval and question answering in the backend.

Let’s break down the steps for better understanding:

Loading a PDF file and splitting it into chunks of text.
1. This makes the data optimized for retrieval
Present the chunks to an embedding tool.
1. Embeddings are numerical vector representations of data used to capture relationships, similarities, and meanings in a way that machines can understand. They are widely used in Natural Language Processing (NLP), recommender systems, and search engines.
Next, we put those chunks of text and embeddings in the same DB for retrieval.
Finally, we make it available to the LLM.

Data preparation

Preparing a content store for the LLM will take some steps, as we just saw. So, let’s start by creating a function that can load a file and split it into text chunks for efficient retrieval.

# Imports
from  langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_document(pdf):
    # Load a PDF
    """
    Load a PDF and split it into chunks for efficient retrieval.

    :param pdf: PDF file to load
    :return: List of chunks of text
    """

    loader = PyPDFLoader(pdf)
    docs = loader.load()

    # Instantiate Text Splitter with Chunk Size of 500 words and Overlap of 100 words so that context is not lost
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
    # Split into chunks for efficient retrieval
    chunks = text_splitter.split_documents(docs)

    # Return
    return chunks

Next, we will start building our Streamlit app, and we’ll use that function in the next script.

Web application

We will begin importing the necessary modules in Python. Most of those will come from the langchain packages.

FAISS is used for document retrieval; OpenAIEmbeddings transforms the text chunks into numerical scores for better similarity calculation by the LLM; ChatOpenAI is what enables us to interact with the OpenAI API; create_retrieval_chain is what actually the RAG does, retrieving and augmenting the LLM with that data; create_stuff_documents_chain glues the model and the ChatPromptTemplate.

Note: You will need to generate an OpenAI Key to be able to run this script. If it’s the first time you’re creating your account, you get some free credits. But if you have it for some time, it is possible that you will have to add 5 dollars in credits to be able to access OpenAI’s API. An option is using Hugging Face’s Embedding.

# Imports
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.chains import create_retrieval_chain
from langchain_openai import ChatOpenAI
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from scripts.secret import OPENAI_KEY
from scripts.document_loader import load_document
import streamlit as st

This first code snippet will create the App title, create a box for file upload, and prepare the file to be added to the load_document() function.

# Create a Streamlit app
st.title("AI-Powered Document Q&A")

# Load document to streamlit
uploaded_file = st.file_uploader("Upload a PDF file", type="pdf")

# If a file is uploaded, create the TextSplitter and vector database
if uploaded_file :

    # Code to work around document loader from Streamlit and make it readable by langchain
    temp_file = "./temp.pdf"
    with open(temp_file, "wb") as file:
        file.write(uploaded_file.getvalue())
        file_name = uploaded_file.name

    # Load document and split it into chunks for efficient retrieval.
    chunks = load_document(temp_file)

    # Message user that document is being processed with time emoji
    st.write("Processing document... :watch:")

Machines understand numbers better than text, so in the end, we will have to provide the model with a database of numbers that it can compare and check for similarity when performing a query. That’s where the embeddings will be useful to create the vector_db, in this next piece of code.

# Generate embeddings
    # Embeddings are numerical vector representations of data, typically used to capture relationships, similarities,
    # and meanings in a way that machines can understand. They are widely used in Natural Language Processing (NLP),
    # recommender systems, and search engines.
    embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_KEY,
                                  model="text-embedding-ada-002")

    # Can also use HuggingFaceEmbeddings
    # from langchain_huggingface.embeddings import HuggingFaceEmbeddings
    # embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

    # Create vector database containing chunks and embeddings
    vector_db = FAISS.from_documents(chunks, embeddings)

Next, we create a retriever object to navigate in the vector_db.

# Create a document retriever
    retriever = vector_db.as_retriever()
    llm = ChatOpenAI(model_name="gpt-4o-mini", openai_api_key=OPENAI_KEY)

Then, we will create the system_prompt, which is a set of instructions to the LLM on how to answer, and we will create a prompt template, preparing it to be added to the model once we get the input from the user.

# Create a system prompt
    # It sets the overall context for the model.
    # It influences tone, style, and focus before user interaction starts.
    # Unlike user inputs, a system prompt is not visible to the end user.

    system_prompt = (
        "You are a helpful assistant. Use the given context to answer the question."
        "If you don't know the answer, say you don't know. "
        "{context}"
    )

    # Create a prompt Template
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", "{input}"),
        ]
    )

    # Create a chain
    # It creates a StuffDocumentsChain, which takes multiple documents (text data) and "stuffs" them together before passing them to the LLM for processing.

    question_answer_chain = create_stuff_documents_chain(llm, prompt)

Moving on, we create the core of the RAG framework, pasting together the retriever object and the prompt. This object adds relevant documents from a data source (e.g., a vector database) and makes it ready to be processed using an LLM to generate a response.

# Creates the RAG
     chain = create_retrieval_chain(retriever, question_answer_chain)

Finally, we create the variable question for the user input. If this question box is filled with a query, we pass it to the chain, which calls the LLM to process and return the response, which will be printed on the app’s screen.

# Streamlit input for question
    question = st.text_input("Ask a question about the document:")
    if question:
        # Answer
        response = chain.invoke({"input": question})['answer']
        st.write(response)

Here is a screenshot of the result.

Screenshot of the AI-Powered Document Q&A — Screenshot of the final app. Image by the author.

And this is a GIF for you to see the File Reader Ai Assistant in action!

GIF of the File Reader AI Assistant in action — File Reader AI Assistant in action. Image by the author.

Before you go

In this project, we learned what the RAG framework is and how it helps the Llm to perform better and also perform well with specific knowledge.

AI can be powered with knowledge from an instruction manual, databases from a company, some finance files, or contracts, and then become fine-tuned to respond accurately to domain-specific content queries. The knowledge base is augmented with a content store.

To recap, this is how the framework works:

1️⃣ User Query → Input text is received.

2️⃣ Retrieve Relevant Documents → Searches a knowledge base (e.g., a database, vector store).

3️⃣ Augment Context → Retrieved documents are added to the input.

4️⃣ Generate Response → An LLM processes the combined input and produces an answer.

GitHub repository

https://github.com/gurezende/Basic-Rag

About me

If you liked this content and want to learn more about my work, here is my website, where you can also find all my contacts.

https://gustavorsantos.me

References

https://cloud.google.com/use-cases/retrieval-augmented-generation

https://www.ibm.com/think/topics/retrieval-augmented-generation

https://youtu.be/T-D1OfcDW1M?si=G0UWfH5-wZnMu0nw

https://python.langchain.com/docs/introduction

https://www.geeksforgeeks.org/how-to-get-your-own-openai-api-key

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Google Cloud aims for more cost-effective Arm computing with Axion N4A

It’s not alone: AWS introduced its own Arm-based chip, Graviton, in 2018 to reduce the cost of running internal cloud workloads such as Amazon retail IT, and now 50% of new AWS instances run on it. Microsoft, too, recently developed an Arm chip, Cobalt, to run Microsoft 365 and to

Perplexity’s open-source tool to run trillion-parameter models without costly upgrades

The obvious answer would be Nvidia’s new GB200 systems, essentially one giant 72-GPU server. But those cost millions, face extreme supply shortages, and aren’t available everywhere, the researchers noted. Meanwhile, H100 and H200 systems are plentiful and relatively cheap. The catch: running large models across multiple older systems has traditionally

Arista shares Q3 financials, touts ‘golden era in networking’

“I think a lot of these designs will materialize as the standards for Ethernet are getting stronger and stronger. We now have a UEC [Ultra Ethernet Consortium] spec. You heard me talk about the Scale-Up Ethernet spec for ESUN where we can bring different work streams onto the same Ethernet

AI and greed cause a massive spike in memory prices

TrendForce says that as of 2Q25, HBM3e still commanded a price premium more than four times that of DDR5, so it’s hard to fault the memory manufacturers for wanting to make a buck. However, as DDR5 prices continue to rise, the gap between the two is projected to narrow significantly

Clearway projects strong renewables outlook while adding gas assets

300 MW Average size of Clearway’s current projects. Most of the projects slated for 2030 and beyond are 500 MW or more, the company said. >90% Percentage of projects planned for 2031-2032 that are focused in the Western U.S or the PJM Interconnection, “where renewables are cost competitive and/or valued.” 1.8 GW Capacity of power purchase agreements meant to support data center loads the company has signed so far this year. Meeting digital infrastructure energy needs Independent power producer Clearway Energy announced a U.S. construction pipeline of 27 GW of generation and storage resources following strong third-quarter earnings. The San Francisco-based company is owned by Global Infrastructure Partners and TotalEnergies. According to its earnings presentation, it has an operating portfolio of more than 12 GW of wind, solar, gas and storage. Clearway President and CEO Craig Cornelius said on an earnings call Tuesday that the company is positioning itself to serve large load data centers. “Growth in both the medium and long term reflects the strong traction we’ve made in supporting the energy needs of our country’s digital infrastructure build-out and reindustrialization,” Cornelius said during the call. “We expect this to be a core driver of Clearway’s growth outlook well into the 2030s.” Cornelius noted that Clearway executed and awarded 1.8 GW of power purchase agreements meant to support data center loads so far this year, and is currently developing generation aimed at serving “gigawatt class co-located data centers across five states.” Its 27 GW pipeline of projects in development or under construction includes 8.2 GW of solar, 4.6 GW of wind, 1.3 GW of wind repowering, 8 GW of standalone storage, 2.1 GW of paired storage and 2.6 GW of natural gas aimed at serving data centers. Under the OBBBA, wind and solar projects that begin construction by July 4,

USA Crude Oil Stocks Rise More Than 5MM Barrels WoW

U.S. commercial crude oil inventories, excluding those in the Strategic Petroleum Reserve (SPR) increased by 5.2 million barrels from the week ending October 24 to the week ending October 31. That’s what the U.S. Energy Information Administration (EIA) highlighted in its latest weekly petroleum status report, which was released on November 5 and included data for the week ending October 31. The EIA report showed that crude oil stocks, not including the SPR, stood at 421.2 million barrels on October 31, 416.0 million barrels on October 24, and 427.7 million barrels on November 1, 2024. Crude oil in the SPR stood at 409.6 million barrels on October 31, 409.1 million barrels on October 24, and 387.2 million barrels on November 1, 2024, the report highlighted. Total petroleum stocks – including crude oil, total motor gasoline, fuel ethanol, kerosene type jet fuel, distillate fuel oil, residual fuel oil, propane/propylene, and other oils – stood at 1.679 billion barrels on October 31, the report revealed. Total petroleum stocks were up 1.1 million barrels week on week and up 44.5 million barrels year on year, the report showed. “At 421.2 million barrels, U.S. crude oil inventories are about four percent below the five year average for this time of year,” the EIA said in its latest weekly petroleum status report. “Total motor gasoline inventories decreased by 4.7 million barrels from last week and are about five percent below the five year average for this time of year. Both finished gasoline and blending components inventories decreased last week,” it added. “Distillate fuel inventories decreased by 0.6 million barrels last week and are about nine percent below the five year average for this time of year. Propane/propylene inventories increased by 0.4 million barrels from last week and are 15 percent above the five year average

Energy Transfer Bags 20-Year Deal to Deliver Gas for Entergy Louisiana

Energy Transfer LP has signed a 20-year transport agreement to deliver natural gas to Entergy Corp to support the power utility’s operations in Louisiana. “Under the agreement, Energy Transfer would initially provide 250,000 MMBtu per day of firm transportation service beginning in February 2028 and continuing through January 2048”, a joint statement said. “The deal structure also provides an option to Entergy to expand delivery capacity in the region to meet future energy demand and demonstrates both companies’ long-term commitment to meeting the region’s growing energy needs. “The natural gas supplied through this agreement, already in Entergy’s financial plan, will help fuel Entergy Louisiana’s combined-cycle combustion turbine facilities, which are being developed to provide efficient, cleaner energy for the company’s customers and to support projects like Meta’s new hyperscale data center in Richland Parish. “The project includes expanding Energy Transfer’s Tiger Pipeline with the construction of a 12-mile lateral with a capacity of up to one Bcfd. Natural gas supply for this project will be sourced from Energy Transfer’s extensive pipeline network which is connected to all the major producing basins in the U.S.” Entergy Louisiana had 1.1 million electric customers in 58 of Louisiana’s 64 parishes as of December 2024, Entergy Louisiana says on its website. Earlier Energy Transfer secured an agreement to deliver gas for a power-data center partnership between VoltaGrid LLC and Oracle Corp. VoltaGrid will deploy 2.3 gigawatts of “cutting-edge, ultra-low-emissions infrastructure, supplied by Energy Transfer’s pipeline network, to support the energy demands of Oracle Cloud Infrastructure’s (OCI) next-generation artificial intelligence data centers”, VoltaGrid said in a press release October 15. “The VoltaGrid power infrastructure will be delivered through the proprietary VoltaGrid platform – a modular, high-transient-response system developed by VoltaGrid with key suppliers, including INNIO Jenbacher and ABB”. “This power plant deployment is being supplied with firm natural gas from Energy Transfer’s expansive pipeline

SRP to Convert Unit 4 of SGS Station in Arizona to Use Gas

Three of the four units at the coal-fired Springerville Generation Station (SGS) in Arizona will shift to natural gas in the early 2030s. This week Salt River Project’s (SRP) board approved the conversion of Unit 4. Earlier this year Tucson Electric Power (TEP), which operates all four units, said it will convert Units 1 and 2. Unit 3, owned by the Tri-State Generation and Transmission Association, is set to be retired. “Today’s decision is the lowest-cost option to preserve the plant’s 400-megawatt (MW) generating capacity, enough to serve 90,000 homes, which is important to meeting the Valley’s growing power need in the early 2030s”, SRP said in a statement on its website. “Converting SGS Unit 4 to run on natural gas is expected to save SRP customers about $45 million compared to building a new natural gas facility and about $826 million relative to adding new long-duration lithium-ion batteries over the same period”, the public power utility added. “The decision also provides a bridge to the mid-2040s, when other generating technology options, including advanced nuclear, are mature”. Gas for the converted Unit 4 will come from a new pipeline that SRP will build. The pipeline will also supply gas to the Coronado Generating Station (CGS). On June 24 SRP announced board approval for the conversion of CGS from coal to gas. “SRP is working to more than double the capacity of its power system in the next 10 years while maintaining reliability and affordability and making continued progress toward our sustainability goals”, SRP said. “SRP will accomplish this through an all-of-the-above approach that plans to add renewables in addition to natural gas and storage resources”. SRP currently supplies 3,000 MW of “carbon-free energy” including over 1,500 MW of solar, with nearly 1,300 MW of battery and pumped hydro storage supporting its

Libya NOC Announces New Oil Find

In a statement posted on its Facebook page this week, which was translated, Libya’s National Oil Corporation announced the discovery of oil in the Ghaddams basin. “The National Oil Corporation announced the discovery of a new oil for the Gulf Arab Oil Company in Al-Beer H1 – MN 4 (H1-NC4) located in the Ghadams Al-Rasoubi basin,” the translated statement said. “The daily production of this well is estimated at 4,675 barrels per day of crude oil, and about two million cubic feet of gas,” it added. In the statement, the National Oil Corporation highlighted that the project is 100 percent owned by the corporation. In a statement posted on its website on October 29, Libya’s National Oil Corporation announced a new oil discovery in the Sirte Basin. “The National Oil Corporation (NOC) has announced a new oil discovery by OMV Austria Ltd. – Libya Branch in the Sirte Basin, specifically at well B1 in Block 106/4,” the National Oil Corporation said in that statement. “Production tests show that this exploratory well, reaching a depth of 10,476 feet, is producing over 4,200 barrels of oil per day, with gas production expected to exceed 2.6 million cubic feet daily,” it added. “This well marks the first discovery for OMV in Block 106/4, under the Exploration and Production Sharing Agreement (EPSA) signed in 2008 between the NOC, as the owner, and OMV, as the operator,” Libya’s National Oil Corporation pointed out. When Rigzone asked OMV for comment on this statement, an OMV spokesperson directed Rigzone to an OMV post on LinkedIn about the discovery. “OMV has safely completed an onshore exploration well in the Contract Area 106/4 (EPSA C103) of Libya’s Sirte Basin,” OMV noted in that post. “The well, drilled in the ‘Essar’ prospect, encountered oil-bearing formations with estimated contingent recoverable volumes

Suncor Raises Dividend

Suncor Energy Inc has declared a quarterly dividend of CAD 0.6 ($0.4) per share, increased about five percent from the prior three-month period on the back of improved operational performance. The Canadian oil sands-focused producer and refiner achieved its highest quarterly volumes in bitumen production at 958,300 barrels per day (bpd), refinery throughput at 491,700 bpd and refined product sales at 646,800 bpd in the July-September 2025 period. “The board’s confidence in our improved operational performance and solid financial foundation underpins its decision to raise the quarterly dividend, reflecting our ongoing commitment to creating value for shareholders”, Suncor said in a statement on its website. Suncor reported CAD 1.79 billion in third-quarter net profit adjusted for nonrecurring items, up from CAD 873 million for the prior three months but down from CAD 1.88 billion for the same period last year. Its adjusted earnings per share of CAD 1.34 ($1) beat the Zacks Consensus Estimate of $0.85. Net income was CAD 1.62 billion. That was up from CAD 1.13 billion for Q2 but down from CAD 2.02 billion for Q3 2024 as upstream price realizations dropped. Cash flow from operating activities was CAD 3.79 billion, up from CAD 2.92 billion for Q2 but down from CAD 4.26 billion for Q3 2024. After adjustment for changes in non-cash working capital, the figure becomes CAD 3.83 billion, up both quarter-on-quarter and year-on-year. Upstream output averaged 870,000 bpd, up from 808,100 bpd in Q2 and 828,600 in Q3 2024. Suncor said the Q3 2025 figure is its highest third-quarter upstream production. The increase in oil sands bitumen production was driven by record quarterly production at Fort Hills and record third-quarter production at Firebag. Suncor’s net synthetic crude production also set a third-quarter record at 544,100 bpd, having benefited from “excellent upgrader reliability and improved

Top network and data center events 2025 & 2026

Denise Dubie is a senior editor at Network World with nearly 30 years of experience writing about the tech industry. Her coverage areas include AIOps, cybersecurity, networking careers, network management, observability, SASE, SD-WAN, and how AI transforms enterprise IT. A seasoned journalist and content creator, Denise writes breaking news and in-depth features, and she delivers practical advice for IT professionals while making complex technology accessible to all. Before returning to journalism, she held senior content marketing roles at CA Technologies, Berkshire Grey, and Cisco. Denise is a trusted voice in the world of enterprise IT and networking.

Google’s cheaper, faster TPUs are here, while users of other AI processors face a supply crunch

Opportunities for the AI industry LLM vendors such as OpenAI and Anthropic, which still have relatively young code bases and are continuously evolving them, also have much to gain from the arrival of Ironwood for training their models, said Forrester vice president and principal analyst Charlie Dai. In fact, Anthropic has already agreed to procure 1 million TPUs for training and its models and using them for inferencing. Other, smaller vendors using Google’s TPUs for training models include Lightricks and Essential AI. Google has seen a steady increase in demand for its TPUs (which it also uses to run interna services), and is expected to buy $9.8 billion worth of TPUs from Broadcom this year, compared to $6.2 billion and $2.04 billion in 2024 and 2023 respectively, according to Harrowell. “This makes them the second-biggest AI chip program for cloud and enterprise data centers, just tailing Nvidia, with approximately 5% of the market. Nvidia owns about 78% of the market,” Harrowell said. The legacy problem While some analysts were optimistic about the prospects for TPUs in the enterprise, IDC research director Brandon Hoff said enterprises will most likely to stay away from Ironwood or TPUs in general because of their existing code base written for other platforms. “For enterprise customers who are writing their own inferencing, they will be tied into Nvidia’s software platform,” Hoff said, referring to CUDA, the software platform that runs on Nvidia GPUs. CUDA was released to the public in 2007, while the first version of TensorFlow has only been around since 2015.

Cisco launches AI infrastructure, AI practitioner certifications

“This new certification focuses on artificial intelligence and machine learning workloads, helping technical professionals become AI-ready and successfully embed AI into their workflows,” said Pat Merat, vice president at Learn with Cisco, in a blog detailing the new AI Infrastructure Specialist certification. “The certification validates a candidate’s comprehensive knowledge in designing, implementing, operating, and troubleshooting AI solutions across Cisco infrastructure.” Separately, the AITECH certification is part of the Cisco AI Infrastructure track, which complements its existing networking, data center, and security certifications. Cisco says the AITECH cert training is intended for network engineers, system administrators, solution architects, and other IT professionals who want to learn how AI impacts enterprise infrastructure. The training curriculum covers topics such as: Utilizing AI for code generation, refactoring, and using modern AI-assisted coding workflows. Using generative AI for exploratory data analysis, data cleaning, transformation, and generating actionable insights. Designing and implementing multi-step AI-assisted workflows and understanding complex agentic systems for automation. Learning AI-powered requirements, evaluating customization approaches, considering deployment strategies, and designing robust AI workflows. Evaluating, fine-tuning, and deploying pre-trained AI models, and implementing Retrieval Augmented Generation (RAG) systems. Monitoring, maintaining, and optimizing AI-powered workflows, ensuring data integrity and security. AITECH certification candidates will learn how to use AI to enhance productivity, automate routine tasks, and support the development of new applications. The training program includes hands-on labs and simulations to demonstrate practical use cases for AI within Cisco and multi-vendor environments.

Chip-to-Grid Gets Bought: Eaton, Vertiv, and Daikin Deals Imply a New Thermal Capital Cycle

This week delivered three telling acquisitions that mark a turning point for the global data center supply chain; and more specifically, for the high-density liquid cooling mega-play now unfolding across the power-thermal continuum. Eaton is acquiring Boyd Thermal for $9.5 billion from Goldman Sachs Asset Management. Vertiv is buying PurgeRite for about $1 billion from Milton Street Capital. And Daikin Applied has moved to acquire Chilldyne, one of the most proven negative-pressure direct-to-chip pioneers. On paper, they’re three distinct transactions. In reality, they’re chapters in the same story: the acceleration of strategic vertical integration around thermal infrastructure for AI-class compute. The Equity Layer: Private Capital Builds, Strategics Buy From an equity standpoint, these are classic handoff moments between private-equity construction and corporate consolidation. Goldman Sachs built Boyd Thermal into a global platform spanning cold plates, CDUs, and high-density liquid loop design, now sold to Eaton at an enterprise multiple north of 5× 2026E revenue. Milton Street Capital took PurgeRite from a specialist contractor in fluid flushing and commissioning into a nationwide services platform. And Daikin, long synonymous with chillers and air-side thermal, is crossing the liquid Rubicon by buying its way into the D2C ecosystem. Each deal crystallizes a simple fact: liquid cooling is no longer an adjunct; it’s core infrastructure. Private equity did its job scaling the parts. Strategic players are now paying up for the system. Eaton’s Bid: The Chip-to-Grid Thesis For Eaton, Boyd Thermal is the final missing piece in its “chip-to-grid” thesis. The company already owns the electrical side of the data center: UPS, busway, switchgear, and monitoring. Boyd plugs the thermal gap, allowing Eaton to market full rack-to-substation solutions for AI loads in the 50–100 kW+ range. It’s a statement acquisition that places Eaton squarely against Schneider Electric, Vertiv and ABB in the race to

Space: The final frontier for data processing

There are, however, a couple of reasons why data centers in space are being considered. There are plenty of reports about how the increased amount of AI processing is affecting power consumption within data centers; the World Economic Forum has estimated that the power required to handle AI is increasing at a rate of between 26% and 36% annually. Therefore, it is not surprising that organizations are looking at other options. But an even more pressing reason for orbiting data centers is to handle the amount of data that is being produced by existing satellites, Judge said. “Essentially, satellites are gathering a lot more data than can be sent to earth, because downlinks are a bottleneck,” he noted. “With AI capacity in orbit, they could potentially analyze more of this data, extract more useful information, and send insights back to earth. My overall feeling is that any more data processing in space is going to be driven by space processing needs.” And China may already be ahead of the game. Last year, Guoxing Aerospace launched 12 satellites, forming a space-based computing network dubbed the Three-Body Computing Constellation. When completed, it will contain 2,800 satellites, all handling the orchestration and processing of data, taking edge computing to a new dimension.

Meta’s $27B Hyperion Campus: A New Blueprint for AI Infrastructure Finance

At the end of October, Meta announced a joint venture with funds managed by Blue Owl Capital to finance, develop, and operate the previously announced “Hyperion” project, a multi-building AI megacampus in Richland Parish, Louisiana. Under the new JV structure, Blue Owl will own 80 percent and Meta 20 percent, though Meta had announced the project long before Blue Owl’s involvement was confirmed. The venture anticipates roughly $27 billion in total development costs for the buildings and the long-lived power, cooling, and connectivity infrastructure. Blue Owl contributed about $7 billion in cash at formation; Meta received a $3 billion one-time distribution and contributed land and construction-in-progress to the vehicle. Rachel Peterson, VP of Data Centers at Meta, noted that construction on the project is already well underway, with thousands of workers on-site. Structuring Capital and Control Media coverage from Reuters and others characterizes the financing package as one of the largest private-capital deals ever for a single industrial campus, with debt placements led by PIMCO and additional institutional investors. Meta keeps the project largely off its balance sheet through the joint venture while retaining the development and property-management role and serving as the anchor tenant for the campus. The JV allows Meta to smooth its capital expenditures and manage risk while maintaining execution control over its most ambitious AI site to date. The structure incorporates lease agreements and a residual-value guarantee, according to Kirkland & Ellis (Blue Owl’s counsel), enabling lenders and equity holders to underwrite a very large, long-duration asset with multiple exit paths. For Blue Owl, Hyperion represents a utility-like digital-infrastructure platform with contracted cash flows to a single A-tier counterparty: a hyperscaler running mission-critical AI workloads for training and inference. As Barron’s and MarketWatch have noted, the deal underscores Wall Street’s ongoing appetite for AI-infrastructure investments at

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle