Stay Ahead, Stay ONMINE

LLM + RAG: Creating an AI-Powered File Reader Assistant

Introduction AI is everywhere.  It is hard not to interact at least once a day with a Large Language Model (LLM). The chatbots are here to stay. They’re in your apps, they help you write better, they compose emails, they read emails…well, they do a lot. And I don’t think that that is bad. In fact, my opinion is the other way – at least so far. I defend and advocate for the use of AI in our daily lives because, let’s agree, it makes everything much easier. I don’t have to spend time double-reading a document to find punctuation problems or type. AI does that for me. I don’t waste time writing that follow-up email every single Monday. AI does that for me. I don’t need to read a huge and boring contract when I have an AI to summarize the main takeaways and action points to me! These are only some of AI’s great uses. If you’d like to know more use cases of LLMs to make our lives easier, I wrote a whole book about them. Now, thinking as a data scientist and looking at the technical side, not everything is that bright and shiny.  LLMs are great for several general use cases that apply to anyone or any company. For example, coding, summarizing, or answering questions about general content created until the training cutoff date. However, when it comes to specific business applications, for a single purpose, or something new that didn’t make the cutoff date, that is when the models won’t be that useful if used out-of-the-box – meaning, they will not know the answer. Thus, it will need adjustments. Training an LLM model can take months and millions of dollars. What is even worse is that if we don’t adjust and tune the model to our purpose, there will be unsatisfactory results or hallucinations (when the model’s response doesn’t make sense given our query). So what is the solution, then? Spending a lot of money retraining the model to include our data? Not really. That’s when the Retrieval-Augmented Generation (RAG) becomes useful. RAG is a framework that combines getting information from an external knowledge base with large language models (LLMs). It helps AI models produce more accurate and relevant responses. Let’s learn more about RAG next. What is RAG? Let me tell you a story to illustrate the concept. I love movies. For some time in the past, I knew which movies were competing for the best movie category at the Oscars or the best actors and actresses. And I would certainly know which ones got the statue for that year. But now I am all rusty on that subject. If you asked me who was competing, I would not know. And even if I tried to answer you, I would give you a weak response.  So, to provide you with a quality response, I will do what everybody else does: search for the information online, obtain it, and then give it to you. What I just did is the same idea as the RAG: I obtained data from an external database to give you an answer. When we enhance the LLM with a content store where it can go and retrieve data to augment (increase) its knowledge base, that is the RAG framework in action. RAG is like creating a content store where the model can enhance its knowledge and respond more accurately. User prompt about Content C. LLM retrieves external content to aggregate to the answer. Image by the author. Summarizing: Uses search algorithms to query external data sources, such as databases, knowledge bases, and web pages. Pre-processes the retrieved information. Incorporates the pre-processed information into the LLM. Why use RAG? Now that we know what the RAG framework is let’s understand why we should be using it. Here are some of the benefits: Enhances factual accuracy by referencing real data. RAG can help LLMs process and consolidate knowledge to create more relevant answers  RAG can help LLMs access additional knowledge bases, such as internal organizational data  RAG can help LLMs create more accurate domain-specific content  RAG can help reduce knowledge gaps and AI hallucination As previously explained, I like to say that with the RAG framework, we are giving an internal search engine for the content we want it to add to the knowledge base. Well. All of that is very interesting. But let’s see an application of RAG. We will learn how to create an AI-powered PDF Reader Assistant. Project This is an application that allows users to upload a PDF document and ask questions about its content using AI-powered natural language processing (NLP) tools.  The app uses Streamlit as the front end. Langchain, OpenAI’s GPT-4 model, and FAISS (Facebook AI Similarity Search) for document retrieval and question answering in the backend. Let’s break down the steps for better understanding: Loading a PDF file and splitting it into chunks of text. This makes the data optimized for retrieval Present the chunks to an embedding tool. Embeddings are numerical vector representations of data used to capture relationships, similarities, and meanings in a way that machines can understand. They are widely used in Natural Language Processing (NLP), recommender systems, and search engines. Next, we put those chunks of text and embeddings in the same DB for retrieval. Finally, we make it available to the LLM. Data preparation Preparing a content store for the LLM will take some steps, as we just saw. So, let’s start by creating a function that can load a file and split it into text chunks for efficient retrieval. # Imports from langchain_community.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter def load_document(pdf): # Load a PDF “”” Load a PDF and split it into chunks for efficient retrieval. :param pdf: PDF file to load :return: List of chunks of text “”” loader = PyPDFLoader(pdf) docs = loader.load() # Instantiate Text Splitter with Chunk Size of 500 words and Overlap of 100 words so that context is not lost text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100) # Split into chunks for efficient retrieval chunks = text_splitter.split_documents(docs) # Return return chunks Next, we will start building our Streamlit app, and we’ll use that function in the next script. Web application We will begin importing the necessary modules in Python. Most of those will come from the langchain packages. FAISS is used for document retrieval; OpenAIEmbeddings transforms the text chunks into numerical scores for better similarity calculation by the LLM; ChatOpenAI is what enables us to interact with the OpenAI API; create_retrieval_chain is what actually the RAG does, retrieving and augmenting the LLM with that data; create_stuff_documents_chain glues the model and the ChatPromptTemplate. Note: You will need to generate an OpenAI Key to be able to run this script. If it’s the first time you’re creating your account, you get some free credits. But if you have it for some time, it is possible that you will have to add 5 dollars in credits to be able to access OpenAI’s API. An option is using Hugging Face’s Embedding.  # Imports from langchain_community.vectorstores import FAISS from langchain_openai import OpenAIEmbeddings from langchain.chains import create_retrieval_chain from langchain_openai import ChatOpenAI from langchain.chains.combine_documents import create_stuff_documents_chain from langchain_core.prompts import ChatPromptTemplate from scripts.secret import OPENAI_KEY from scripts.document_loader import load_document import streamlit as st This first code snippet will create the App title, create a box for file upload, and prepare the file to be added to the load_document() function. # Create a Streamlit app st.title(“AI-Powered Document Q&A”) # Load document to streamlit uploaded_file = st.file_uploader(“Upload a PDF file”, type=”pdf”) # If a file is uploaded, create the TextSplitter and vector database if uploaded_file :     # Code to work around document loader from Streamlit and make it readable by langchain     temp_file = “./temp.pdf”     with open(temp_file, “wb”) as file:         file.write(uploaded_file.getvalue())         file_name = uploaded_file.name     # Load document and split it into chunks for efficient retrieval.     chunks = load_document(temp_file)     # Message user that document is being processed with time emoji     st.write(“Processing document… :watch:”) Machines understand numbers better than text, so in the end, we will have to provide the model with a database of numbers that it can compare and check for similarity when performing a query. That’s where the embeddings will be useful to create the vector_db, in this next piece of code. # Generate embeddings     # Embeddings are numerical vector representations of data, typically used to capture relationships, similarities,     # and meanings in a way that machines can understand. They are widely used in Natural Language Processing (NLP),     # recommender systems, and search engines.     embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_KEY,                                   model=”text-embedding-ada-002″)     # Can also use HuggingFaceEmbeddings     # from langchain_huggingface.embeddings import HuggingFaceEmbeddings     # embeddings = HuggingFaceEmbeddings(model_name=”sentence-transformers/all-MiniLM-L6-v2″)     # Create vector database containing chunks and embeddings     vector_db = FAISS.from_documents(chunks, embeddings) Next, we create a retriever object to navigate in the vector_db. # Create a document retriever     retriever = vector_db.as_retriever()     llm = ChatOpenAI(model_name=”gpt-4o-mini”, openai_api_key=OPENAI_KEY) Then, we will create the system_prompt, which is a set of instructions to the LLM on how to answer, and we will create a prompt template, preparing it to be added to the model once we get the input from the user. # Create a system prompt     # It sets the overall context for the model.     # It influences tone, style, and focus before user interaction starts.     # Unlike user inputs, a system prompt is not visible to the end user.     system_prompt = (         “You are a helpful assistant. Use the given context to answer the question.”         “If you don’t know the answer, say you don’t know. ”         “{context}”     )     # Create a prompt Template     prompt = ChatPromptTemplate.from_messages(         [             (“system”, system_prompt),             (“human”, “{input}”),         ]     )     # Create a chain     # It creates a StuffDocumentsChain, which takes multiple documents (text data) and “stuffs” them together before passing them to the LLM for processing.     question_answer_chain = create_stuff_documents_chain(llm, prompt) Moving on, we create the core of the RAG framework, pasting together the retriever object and the prompt. This object adds relevant documents from a data source (e.g., a vector database) and makes it ready to be processed using an LLM to generate a response. # Creates the RAG      chain = create_retrieval_chain(retriever, question_answer_chain) Finally, we create the variable question for the user input. If this question box is filled with a query, we pass it to the chain, which calls the LLM to process and return the response, which will be printed on the app’s screen. # Streamlit input for question     question = st.text_input(“Ask a question about the document:”)     if question:         # Answer         response = chain.invoke({“input”: question})[‘answer’]         st.write(response) Here is a screenshot of the result. Screenshot of the final app. Image by the author. And this is a GIF for you to see the File Reader Ai Assistant in action! File Reader AI Assistant in action. Image by the author. Before you go In this project, we learned what the RAG framework is and how it helps the Llm to perform better and also perform well with specific knowledge. AI can be powered with knowledge from an instruction manual, databases from a company, some finance files, or contracts, and then become fine-tuned to respond accurately to domain-specific content queries. The knowledge base is augmented with a content store. To recap, this is how the framework works: 1️⃣ User Query → Input text is received. 2️⃣ Retrieve Relevant Documents → Searches a knowledge base (e.g., a database, vector store). 3️⃣ Augment Context → Retrieved documents are added to the input. 4️⃣ Generate Response → An LLM processes the combined input and produces an answer. GitHub repository https://github.com/gurezende/Basic-Rag About me If you liked this content and want to learn more about my work, here is my website, where you can also find all my contacts. https://gustavorsantos.me References https://cloud.google.com/use-cases/retrieval-augmented-generation https://www.ibm.com/think/topics/retrieval-augmented-generation https://python.langchain.com/docs/introduction https://www.geeksforgeeks.org/how-to-get-your-own-openai-api-key

Introduction

AI is everywhere. 

It is hard not to interact at least once a day with a Large Language Model (LLM). The chatbots are here to stay. They’re in your apps, they help you write better, they compose emails, they read emails…well, they do a lot.

And I don’t think that that is bad. In fact, my opinion is the other way – at least so far. I defend and advocate for the use of AI in our daily lives because, let’s agree, it makes everything much easier.

I don’t have to spend time double-reading a document to find punctuation problems or type. AI does that for me. I don’t waste time writing that follow-up email every single Monday. AI does that for me. I don’t need to read a huge and boring contract when I have an AI to summarize the main takeaways and action points to me!

These are only some of AI’s great uses. If you’d like to know more use cases of LLMs to make our lives easier, I wrote a whole book about them.

Now, thinking as a data scientist and looking at the technical side, not everything is that bright and shiny. 

LLMs are great for several general use cases that apply to anyone or any company. For example, coding, summarizing, or answering questions about general content created until the training cutoff date. However, when it comes to specific business applications, for a single purpose, or something new that didn’t make the cutoff date, that is when the models won’t be that useful if used out-of-the-box – meaning, they will not know the answer. Thus, it will need adjustments.

Training an LLM model can take months and millions of dollars. What is even worse is that if we don’t adjust and tune the model to our purpose, there will be unsatisfactory results or hallucinations (when the model’s response doesn’t make sense given our query).

So what is the solution, then? Spending a lot of money retraining the model to include our data?

Not really. That’s when the Retrieval-Augmented Generation (RAG) becomes useful.

RAG is a framework that combines getting information from an external knowledge base with large language models (LLMs). It helps AI models produce more accurate and relevant responses.

Let’s learn more about RAG next.

What is RAG?

Let me tell you a story to illustrate the concept.

I love movies. For some time in the past, I knew which movies were competing for the best movie category at the Oscars or the best actors and actresses. And I would certainly know which ones got the statue for that year. But now I am all rusty on that subject. If you asked me who was competing, I would not know. And even if I tried to answer you, I would give you a weak response. 

So, to provide you with a quality response, I will do what everybody else does: search for the information online, obtain it, and then give it to you. What I just did is the same idea as the RAG: I obtained data from an external database to give you an answer.

When we enhance the LLM with a content store where it can go and retrieve data to augment (increase) its knowledge base, that is the RAG framework in action.

RAG is like creating a content store where the model can enhance its knowledge and respond more accurately.

Diagram: User prompts and content using LLM + RAG
User prompt about Content C. LLM retrieves external content to aggregate to the answer. Image by the author.

Summarizing:

  1. Uses search algorithms to query external data sources, such as databases, knowledge bases, and web pages.
  2. Pre-processes the retrieved information.
  3. Incorporates the pre-processed information into the LLM.

Why use RAG?

Now that we know what the RAG framework is let’s understand why we should be using it.

Here are some of the benefits:

  • Enhances factual accuracy by referencing real data.
  • RAG can help LLMs process and consolidate knowledge to create more relevant answers 
  • RAG can help LLMs access additional knowledge bases, such as internal organizational data 
  • RAG can help LLMs create more accurate domain-specific content 
  • RAG can help reduce knowledge gaps and AI hallucination

As previously explained, I like to say that with the RAG framework, we are giving an internal search engine for the content we want it to add to the knowledge base.

Well. All of that is very interesting. But let’s see an application of RAG. We will learn how to create an AI-powered PDF Reader Assistant.

Project

This is an application that allows users to upload a PDF document and ask questions about its content using AI-powered natural language processing (NLP) tools. 

  • The app uses Streamlit as the front end.
  • Langchain, OpenAI’s GPT-4 model, and FAISS (Facebook AI Similarity Search) for document retrieval and question answering in the backend.

Let’s break down the steps for better understanding:

  1. Loading a PDF file and splitting it into chunks of text.
    1. This makes the data optimized for retrieval
  2. Present the chunks to an embedding tool.
    1. Embeddings are numerical vector representations of data used to capture relationships, similarities, and meanings in a way that machines can understand. They are widely used in Natural Language Processing (NLP), recommender systems, and search engines.
  3. Next, we put those chunks of text and embeddings in the same DB for retrieval.
  4. Finally, we make it available to the LLM.

Data preparation

Preparing a content store for the LLM will take some steps, as we just saw. So, let’s start by creating a function that can load a file and split it into text chunks for efficient retrieval.

# Imports
from  langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_document(pdf):
    # Load a PDF
    """
    Load a PDF and split it into chunks for efficient retrieval.

    :param pdf: PDF file to load
    :return: List of chunks of text
    """

    loader = PyPDFLoader(pdf)
    docs = loader.load()

    # Instantiate Text Splitter with Chunk Size of 500 words and Overlap of 100 words so that context is not lost
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
    # Split into chunks for efficient retrieval
    chunks = text_splitter.split_documents(docs)

    # Return
    return chunks

Next, we will start building our Streamlit app, and we’ll use that function in the next script.

Web application

We will begin importing the necessary modules in Python. Most of those will come from the langchain packages.

FAISS is used for document retrieval; OpenAIEmbeddings transforms the text chunks into numerical scores for better similarity calculation by the LLM; ChatOpenAI is what enables us to interact with the OpenAI API; create_retrieval_chain is what actually the RAG does, retrieving and augmenting the LLM with that data; create_stuff_documents_chain glues the model and the ChatPromptTemplate.

Note: You will need to generate an OpenAI Key to be able to run this script. If it’s the first time you’re creating your account, you get some free credits. But if you have it for some time, it is possible that you will have to add 5 dollars in credits to be able to access OpenAI’s API. An option is using Hugging Face’s Embedding. 

# Imports
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.chains import create_retrieval_chain
from langchain_openai import ChatOpenAI
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from scripts.secret import OPENAI_KEY
from scripts.document_loader import load_document
import streamlit as st

This first code snippet will create the App title, create a box for file upload, and prepare the file to be added to the load_document() function.

# Create a Streamlit app
st.title("AI-Powered Document Q&A")

# Load document to streamlit
uploaded_file = st.file_uploader("Upload a PDF file", type="pdf")

# If a file is uploaded, create the TextSplitter and vector database
if uploaded_file :

    # Code to work around document loader from Streamlit and make it readable by langchain
    temp_file = "./temp.pdf"
    with open(temp_file, "wb") as file:
        file.write(uploaded_file.getvalue())
        file_name = uploaded_file.name

    # Load document and split it into chunks for efficient retrieval.
    chunks = load_document(temp_file)

    # Message user that document is being processed with time emoji
    st.write("Processing document... :watch:")

Machines understand numbers better than text, so in the end, we will have to provide the model with a database of numbers that it can compare and check for similarity when performing a query. That’s where the embeddings will be useful to create the vector_db, in this next piece of code.

# Generate embeddings
    # Embeddings are numerical vector representations of data, typically used to capture relationships, similarities,
    # and meanings in a way that machines can understand. They are widely used in Natural Language Processing (NLP),
    # recommender systems, and search engines.
    embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_KEY,
                                  model="text-embedding-ada-002")

    # Can also use HuggingFaceEmbeddings
    # from langchain_huggingface.embeddings import HuggingFaceEmbeddings
    # embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

    # Create vector database containing chunks and embeddings
    vector_db = FAISS.from_documents(chunks, embeddings)

Next, we create a retriever object to navigate in the vector_db.

# Create a document retriever
    retriever = vector_db.as_retriever()
    llm = ChatOpenAI(model_name="gpt-4o-mini", openai_api_key=OPENAI_KEY)

Then, we will create the system_prompt, which is a set of instructions to the LLM on how to answer, and we will create a prompt template, preparing it to be added to the model once we get the input from the user.

# Create a system prompt
    # It sets the overall context for the model.
    # It influences tone, style, and focus before user interaction starts.
    # Unlike user inputs, a system prompt is not visible to the end user.

    system_prompt = (
        "You are a helpful assistant. Use the given context to answer the question."
        "If you don't know the answer, say you don't know. "
        "{context}"
    )

    # Create a prompt Template
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", "{input}"),
        ]
    )

    # Create a chain
    # It creates a StuffDocumentsChain, which takes multiple documents (text data) and "stuffs" them together before passing them to the LLM for processing.

    question_answer_chain = create_stuff_documents_chain(llm, prompt)

Moving on, we create the core of the RAG framework, pasting together the retriever object and the prompt. This object adds relevant documents from a data source (e.g., a vector database) and makes it ready to be processed using an LLM to generate a response.

# Creates the RAG
     chain = create_retrieval_chain(retriever, question_answer_chain)

Finally, we create the variable question for the user input. If this question box is filled with a query, we pass it to the chain, which calls the LLM to process and return the response, which will be printed on the app’s screen.

# Streamlit input for question
    question = st.text_input("Ask a question about the document:")
    if question:
        # Answer
        response = chain.invoke({"input": question})['answer']
        st.write(response)

Here is a screenshot of the result.

Screenshot of the AI-Powered Document Q&A
Screenshot of the final app. Image by the author.

And this is a GIF for you to see the File Reader Ai Assistant in action!

GIF of the File Reader AI Assistant in action
File Reader AI Assistant in action. Image by the author.

Before you go

In this project, we learned what the RAG framework is and how it helps the Llm to perform better and also perform well with specific knowledge.

AI can be powered with knowledge from an instruction manual, databases from a company, some finance files, or contracts, and then become fine-tuned to respond accurately to domain-specific content queries. The knowledge base is augmented with a content store.

To recap, this is how the framework works:

1️⃣ User Query → Input text is received.

2️⃣ Retrieve Relevant Documents → Searches a knowledge base (e.g., a database, vector store).

3️⃣ Augment Context → Retrieved documents are added to the input.

4️⃣ Generate Response → An LLM processes the combined input and produces an answer.

GitHub repository

https://github.com/gurezende/Basic-Rag

About me

If you liked this content and want to learn more about my work, here is my website, where you can also find all my contacts.

https://gustavorsantos.me

References

https://cloud.google.com/use-cases/retrieval-augmented-generation

https://www.ibm.com/think/topics/retrieval-augmented-generation

https://youtu.be/T-D1OfcDW1M?si=G0UWfH5-wZnMu0nw

https://python.langchain.com/docs/introduction

https://www.geeksforgeeks.org/how-to-get-your-own-openai-api-key

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

AI power efficiency the target of Lotus Microsystems energy advances

By shortening current paths and integrating thermal management directly into the power-delivery structure, vStrata aims to reduce conversion losses while improving cooling efficiency. According to Lotus Microsystems, the module can achieve point-of-load efficiencies of up to 96% while reducing power-conversion losses by more than 50% compared with conventional approaches. “We

Read More »

Zscaler launches zero trust platform for agentic AI

The company will be extending its Zscaler Zero Trust Exchange platform to cover AI agents, including how they connect, how they access data, and how they run on devices. According to Christina Powers, partner and cybersecurity consulting leader at management consulting firm West Monroe Partners, zero trust for agentic systems

Read More »

Energy Department Releases Finalized Fusion Science and Technology Roadmap to Accelerate Commercial Fusion Power

WASHINGTON—The U.S. Department of Energy (DOE) today released the finalized Fusion Science and Technology (FS&T) Roadmap, a national strategy to accelerate the development and commercialization of fusion energy on the most rapid, responsible timeline in history. Building on earlier roadmap efforts, the finalized roadmap brings together fusion science, technology, infrastructure, workforce development, and commercialization priorities into a single national strategy to support fusion pilot plants and commercial fusion power in the mid-2030s. Fusion is the process that powers the sun and stars. For decades, scientists and engineers have worked to bring that same process to Earth as a source of abundant, reliable energy. The finalized roadmap outlines how DOE, industry, universities, and national laboratories will work together to accelerate the path toward commercial fusion energy in the United States. This effort advances President Trump’s energy dominance agenda and reinforces the Administration’s commitment to expanding reliable American energy production, strengthening domestic supply chains, and maintaining U.S. leadership in critical technologies. By accelerating progress toward commercial fusion power, DOE is helping secure a future of abundant and reliable energy. “Fusion energy has entered a new era defined by extraordinary scientific progress and public-private momentum,” said DOE Under Secretary for Science Dr. Darío Gil. “With this roadmap, we now have the clarity, coordination, and sustained commitment needed to turn the promise of fusion into a reality for the American people.” Developed with input from more than 800 scientists and engineers across the public and private sectors, the finalized FS&T Roadmap reflects contributions from more than 15 private companies, over 10 National Laboratories, and more than 70 universities. The roadmap identifies the critical science and technology gaps that must be closed to realize fusion pilot plants and strengthen U.S. leadership in the global fusion industry. The FS&T Roadmap establishes a unified strategy for the U.S.

Read More »

Aramco to divest Malaysian refining assets

Petroliam Nasional Bhd. (PETRONAS) subsidiary PETRONAS Refinery & Petrochemical Corp. Sdn. Bhd. (PRPC) has agreed to buyout Saudi Arabian Oil Co.’s (Aramco) equity interests in the partners’ dual 50-50 joint ventures responsible for operating the 300,000-b/d integrated refining and petrochemical refinery of the Pengerang Integrated Complex (PIC) in southeastern Johor, Malaysia. Subject to fulfillment of customary closing conditions, Petronas will take 100% ownership and become full operator of Pengerang Refining Co. Sdn. Bhd. and Pengerang Petrochemical Co. Sdn. Bhd., collectively known as PRefChem, Aramco and Petronas said in separate releases. Aramco said divestment of the Malaysian assets will support the strategic optimization of the company’s own downstream portfolio by providing additional flexibility to pursue investments aligned with its broader downstream strategy. While Aramco will no longer hold ownership in the Malaysian ventures, the company said it will continue actively explore commercial arrangements with Petronas following the sale, including continuing its existing agreement to supply Saudi Arabian crude oil to the site, as well as opportunities related to technology exchange and integrated product distribution. Petronas said its move to take full control of the downstream assets will allow the company to further enhance operational alignment and flexibility across PRefChem’s value chain, while harnessing its international oil supply network and integrated operating model to support continued reliability and resilience across varying market conditions. Full ownership of PRefChem’s in-country operations also will strengthen Petronas’ ability to support Malaysia’s long-term energy security and industry resilience, the operator said. A definitive timeframe for when the parties expect to finalize the proposed transaction was not revealed. PRefChem operations In addition to the Johor refinery, PRefChem’s operations at PIC include a steam cracker complex equipped to produce 3.4 million tonnes/year (tpy) combined of ethylene, propylene, butadiene, benzene and raffinate-2. PRefChem also operates an associated petrochemical complex at the

Read More »

Delfin Midstream takes $5-billion FID for first FLNG vessel

@import url(‘https://fonts.googleapis.com/css2?family=Inter:[email protected]&display=swap’); .ebm-page__main h1, .ebm-page__main h2, .ebm-page__main h3, .ebm-page__main h4, .ebm-page__main h5, .ebm-page__main h6 { font-family: Inter; } body { line-height: 150%; letter-spacing: 0.025em; } button, .ebm-button-wrapper { font-family: Inter; } .label-style { text-transform: uppercase; color: var(–color-grey); font-weight: 600; font-size: 0.75rem; } .caption-style { font-size: 0.75rem; opacity: .6; } #onetrust-pc-sdk [id*=btn-handler], #onetrust-pc-sdk [class*=btn-handler] { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-policy a, #onetrust-pc-sdk a, #ot-pc-content a { color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-pc-sdk .ot-active-menu { border-color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-accept-btn-handler, #onetrust-banner-sdk #onetrust-reject-all-handler, #onetrust-consent-sdk #onetrust-pc-btn-handler.cookie-setting-link { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-consent-sdk .onetrust-pc-btn-handler { color: #c19a06 !important; border-color: #c19a06 !important; } <!–> Delfin Midstream Inc., Houston, has taken a final investment decision (FID) for the first floating liquefied natural gas (FLNG) vessel of the Delfin LNG project under development in Louisiana and offshore in the Gulf of Mexico. Delfin FLNG 1 will be the first FLNG vessel in the US and the largest FLNG project globally, with an expected export capacity of 4.4 million tonnes/year (tpy) of LNG, the company said in its June 3 release. Concurrent with the FID, a group of investors led by Global Infrastructure Partners (GIP), a part of BlackRock—including existing Delfin investors Mitsui OSK Lines Ltd. (MOL), Vitol, and Diameter Capital Partners—has agreed to invest in the first phase of the project. ]–> <!–> –><!–> –> Oct. 9, 2023 <!–> –><!–> –> July 11, 2023 <!–> –><!–> –> June 9, 2023 <!–> –><!–> –> July 2, 2021 <!–> –> <!–> The vessel is backed by long-term LNG sales agreements with Vitol, Expand Energy, Centrica, and Gunvor, Delfin said, and all necessary permits and licenses required to begin construction have been secured. Construction contracts have been executed with Samsung Heavy Industries Co. Ltd. and Black & Veatch. LNG production is scheduled to begin

Read More »

Chevron files $13.8-billion Argentina oil development proposal

Chevron Corp. applied June 2 to join Argentina’s Large Investment Incentive Regime (RIGI) for a $13.8-billion unconventional oil development at its 100% operated El Trapial-Este block in northern Neuquén province. Until recently, RIGI had attracted about $93 billion across 36 projects. Chevron’s application, which remains subject to government approval, is equivalent to almost one seventh of that total. The filing, which does not consitute a final investment decision, is Chevron’s largest individual investment proposal in Argentina since it entered the country in 1999 and the second-largest project submitted under RIGI, behind YPF SA’s $25-billion LLL Oil development.  Chevron said it is targeting production of about 30,000 b/d from El Trapial-Este, subject to the availability of takeaway infrastructure. The block currently produces about 7,000 b/d. Chevron tested the block with a 7-well pilot in 2021 and has been carrying out development since late 2022, using laterals of more than 3,000 m and techniques transferred from the US Permian basin. In 2023, Chevron committed $500 million to that phase. During the company’s first-quarter earnings call on May 1, chief executive officer Mike Wirth anchored Chevron’s 2030 targets in “assets that are operating today.” El Trapial-Este was not explicitly identified among assets described as the main base for those targets. Wirth also said Chevron would not accelerate Permian production even with Brent above $100/bbl, preferring to manage that asset for free cash flow rather than volume. In the same presentation, Wirth named Argentina among the sources of equity crude that feed Chevron’s global refining system, along with Tengiz, Guyana, the Permian, and Venezuela. The earnings call came weeks before the El proposal filing.  Vaca Muerta costs, takeaway capacity  Breakeven costs in Vaca Muerta’s best blocks are about $40/bbl at the wellhead, according to Rystad Energy, while normalized well productivity—adjusted for lateral length and fracture

Read More »

Santos lets rig contract for Bedout subbasin exploration campaign

Santos Ltd. has let a contract for the Transocean Equinox semisubmersible mobile offshore drilling unit for a multi-well campaign at Bedout subbasin exploration permits offshore North West Shelf Australia, said partner Carnarvon Energy Ltd. in a June 1 release. The objective of the 2027 Bedout exploration campaign is to define the scale of the subbasin’s resource potential and target some of the largest prospects in the exploration portfolio. Shortlisted prospects include Ara, Yuma, Goats Eye, and Hutton, which are all defined on the Bedout MegaMerge 3D seismic survey. The Bedout exploration campaign is on track to start begin in April 2027, with one firm well and one contingent well. The Transocean Equinox is currently engaged in a multi-well exploration drilling campaign off the coast of Victoria, which is expected to be completed by early 2027.  Bedout basin is proposed to be an integrated gas and liquids project. To date, five fields have been discovered. Net 2C contingent resource of 230 MMboe is booked as of Dec. 31, 2024. Santos is operator at Bedout. Carnarvon holds 20% interest in Yuma, Goats Eye, and Hutton, and 10% interest in Ara.

Read More »

US underground natural gas storage capacity edges higher in 2025

Underground working natural gas storage capacity in the Lower 48 states increased modestly in 2025, with most additions concentrated in the South Central and Mountain regions, according to the US Energy Information Administration (EIA). Natural gas storage plays a key role in balancing seasonal demand fluctuations, allowing supplies to be injected during periods of lower consumption and withdrawn during periods of peak demand. EIA calculates natural gas storage capacity in two ways: demonstrated peak capacity and working gas design capacity. Both increased in 2025. EIA data show demonstrated peak storage capacity rose by 6 bcf, or 0.1%, from the previous year, marking the third consecutive annual increase. Demonstrated peak capacity is the sum of the largest volume of working gas stored in each storage field during the previous five-year period, regardless of when the peaks occurred. The South Central and Mountain regions posted the largest gains, with demonstrated peak capacity increasing by 16 bcf and 18 bcf, respectively. Capacity declined in other regions, falling 15 bcf in the East, 8 bcf in the Pacific, and 5 bcf in the Midwest. Working gas design capacity, sometimes referred to as nameplate capacity, is based on the physical characteristics of the reservoir, installed equipment, and operating procedures on the site, which federal or state regulators usually must certify.  As of November, Lower 48 design capacity totaled 4,683 bcf, up 26 bcf from a year earlier. The South Central region accounted for most of the increase, adding 21 bcf of design capacity, while the Mountain region added 6 bcf. Design capacity in the East declined by 2 bcf, primarily because of base gas adjustments. Capacity in the Midwest and Pacific regions was unchanged from the previous year.

Read More »

Arista unveils 1.6T rack-scale switch family for AI infrastructure

The new Arista family joins a growing ecosystem of vendors looking to tap into the 1.6T Ethernet world, which includes Cisco, Nvidia, Celestica and others. “Arista Network’s new 7060XE7 Series is a strong signal of where large-scale AI fabrics are heading: higher bandwidth, better power efficiency, and tighter integration between compute, optics, silicon, cooling, and network operating software,” wrote Sameh Boujelbene, vice president, data center switch and AI networks market research for Dell Oro, in a LinkedIn post. Among the features that stand out to her are “strong customer and ecosystem validation from Microsoft Azure, Oracle Cloud Infrastructure, Meta, AMD, and Broadcom.”

Read More »

Water Emerges as a Critical Constraint for AI Data Centers

“There really has been a major shift within the last couple of years,” Bajpayee said. “I would even say within the last 12 months is where we have seen suddenly a rapid increase in the data center operators’ desire to control their water destiny.” For Gradiant, the MIT-born water technology company that built its reputation serving semiconductor manufacturers, pharmaceutical companies, and industrial customers worldwide, that shift has translated into a rapidly expanding pipeline of data center opportunities. More importantly, Bajpayee believes it signals a fundamental change in how the industry thinks about water itself. The conversation is no longer centered primarily on sustainability metrics or corporate environmental goals. Instead, operators increasingly view water as a business continuity issue. “We’re seeing operators themselves come to us and tell us that these are issues they are facing,” Bajpayee said. “They want to make sure they don’t get stalled, their permits don’t get pulled, their business doesn’t get stopped, and communities don’t push them out because they didn’t figure out a way to control their water.” From Water Treatment to Water Strategy That shift is occurring as Gradiant expands deployments of its recently announced HyperSolved platform, an end-to-end cooling water management system purpose-built for AI data centers. The company says HyperSolved is now being deployed with several of the world’s largest hyperscale operators across North America, Europe, and Asia, reflecting growing industry demand for integrated approaches to water infrastructure. While compute, networking, and power systems have evolved rapidly during the AI era, water management often remains fragmented, requiring operators to coordinate multiple vendors responsible for sourcing, treatment, cooling, wastewater management, reuse, discharge, and regulatory compliance. Gradiant’s approach seeks to consolidate those functions into a single integrated platform and operating model. The timing reflects the growing scale of the challenge. New AI data center

Read More »

Data Center Jobs: Engineering, Construction, Commissioning, Sales, Field Service and Facility Tech Jobs Available in Major Data Center Hotspots

Each month Data Center Frontier, in partnership with Pkaza, posts some of the hottest data center career opportunities in the market. Here’s a look at some of the latest data center jobs posted on the Data Center Frontier jobs board, powered by Pkaza Critical Facilities Recruiting. Looking for Data Center Candidates? Check out Pkaza’s Active Candidate / Featured Candidate Hotlist  Mechanical Applications Engineer Pittsburgh, PA This position is also available in: Denver, CO; Richmond, VA and Georgetown, SC (live by the beach!). Relo available. Our client is a leading provider and manufacturer of industrial HVAC mechanical equipment used in industrial cooling applications for mission critical operations. They help their customers save money by reducing energy and operating costs and provide solutions for modernizing their customer’s existing mechanical infrastructure. This company provides cooling solutions to many of the world’s largest organizations and government facilities and enterprise clients, colocation providers and hyperscale companies. This career-growth minded opportunity offers exciting projects with leading-edge technology and innovation as well as competitive salaries and benefits. Electrical Commissioning Engineer New Albany, OH This traveling position is also available in: New York, NY; White Plains, NY; Dallas, TX; Richmond, VA; Ashburn, VA; Montvale, NJ; Charlotte, NC; Atlanta, GA; Hampton, GA; Cedar Rapids, IA; Phoenix, AZ; Salt Lake City, UT; Kansas City, MO; Omaha, NE; Chesterton, IN; Indianapolis, IN or Chicago, IL. *** ALSO looking for a LEAD EE and ME CxA Agents and CxA PMs ***  Our client is an engineering design and commissioning company that has a national footprint and specializes in MEP critical facilities design. They provide design, commissioning, consulting and management expertise in the critical facilities space. They have a mindset to provide reliability, energy efficiency, sustainable design and LEED expertise when providing these consulting services for Enterprise, Colocation and Hyperscale Companies. This career-growth minded opportunity offers exciting projects

Read More »

Fiber’s Next Act: How AI Is Driving Connectivity Closer to the Edge

ORLANDO, Fla. — Much of the conversation surrounding AI infrastructure has focused on GPUs, power generation, cooling systems, and the unprecedented scale of next-generation data center development. But at Fiber Connect 2026, another reality became increasingly clear: none of those investments matter without the network infrastructure required to connect them. That theme emerged repeatedly during a conversation between Data Center Frontier Editor-in-Chief Matt Vincent and Clearfield Chief Commercial Officer Anis Khemakhem, whose perspective sits at the intersection of broadband infrastructure, fiber deployment, and emerging AI connectivity requirements. While Clearfield is best known throughout the broadband industry for its fiber management and connectivity solutions, Khemakhem argued that AI’s rapid expansion is creating new opportunities, and new challenges, that extend well beyond traditional fiber-to-the-home deployments. “AI is driving that connectivity closer and closer to the edge,” Khemakhem said, noting that growing compute requirements and increasingly latency-sensitive workloads are fundamentally changing assumptions about where infrastructure must reside and how it must be connected. For Data Center Frontier readers, the significance lies in a growing realization that AI infrastructure is becoming as much a networking challenge as a compute challenge. Beyond the Traditional Data Center One of the more notable themes of the discussion was Khemakhem’s view that the term “data center” has become too broad to be useful. The industry often speaks of data centers as a single category, but Clearfield increasingly differentiates between hyperscale campuses, colocation facilities, central office environments, and a rapidly emerging class of edge deployments. “There is no one-size-fits-all data center,” Khemakhem said, describing a continuum that extends from hyperscale facilities all the way to edge locations positioned near users and applications. That distinction matters because many AI applications are introducing latency requirements that cannot always be addressed by centralized facilities alone. As AI inference moves closer to users,

Read More »

Liquid Cooling Market Matures: Innovations, Acquisitions, and Modular Solutions for AI Infrastructure

Thermal Validation Becomes a Strategic Capability Cooling is no longer simply a matter of installing enough CRAC units, chillers, CDUs, or rear-door heat exchangers. As rack densities climb and chip-level heat flux intensifies, the performance of the entire thermal chain increasingly depends on how coldplates, manifolds, pumps, controls, facility water loops, power systems, commissioning practices, and service workflows interact. Vertiv said Strategic Thermal Labs will help it simulate and emulate real-world high-density compute conditions, optimize interactions between the thermal chain and power train, and support customers across design, integration, commissioning, and lifecycle operations. That reflects a broader evolution underway in AI infrastructure. Data centers are becoming tightly coupled systems where thermal behavior influences power design, reliability, serviceability, operational efficiency, and ultimately the utilization of increasingly expensive accelerator platforms. Vertiv also emphasized that the acquisition does not alter its commitment to interoperable, server- and silicon-agnostic infrastructure solutions. That distinction matters because hyperscale and colocation operators remain wary of vendor lock-in at a time when chip architectures, server designs, and cooling strategies continue to evolve rapidly. Viewed through that lens, Vertiv’s acquisition reflects a larger industry shift. Infrastructure providers are no longer waiting for server OEMs or chipmakers to define the cooling roadmap. Instead, they are investing deeper in modeling, validation, and chip-level thermal expertise because the next generation of AI infrastructure performance will increasingly be determined by how effectively those systems work together. Accelsius Moves from Technology Validation to Market Scaling Accelsius offers a different view of where the liquid-cooling market is headed. While some vendors are extending existing architectures, Accelsius is focused on making two-phase direct-to-chip cooling easier to deploy, validate, and scale. The company’s recently introduced NeuCool IR150 is designed around that objective. Described by Accelsius as the industry’s first fully integrated rack-level cooling solution for two-phase liquid cooling,

Read More »

DCF Poll: Which Technology Will Define the Next Generation of AI Data Centers?

Matt Vincent is Editor in Chief of Data Center Frontier, where he leads editorial strategy and coverage focused on the infrastructure powering cloud computing, artificial intelligence, and the digital economy. A veteran B2B technology journalist with more than two decades of experience, Vincent specializes in the intersection of data centers, power, cooling, and emerging AI-era infrastructure. Since assuming the EIC role in 2023, he has helped guide Data Center Frontier’s coverage of the industry’s transition into the gigawatt-scale AI era, with a focus on hyperscale development, behind-the-meter power strategies, liquid cooling architectures, and the evolving energy demands of high-density compute, while working closely with the Digital Infrastructure Group at Endeavor Business Media to expand the brand’s analytical and multimedia footprint. Vincent also hosts The Data Center Frontier Show podcast, where he interviews industry leaders across hyperscale, colocation, utilities, and the data center supply chain to examine the technologies and business models reshaping digital infrastructure. Since its inception he serves as Head of Content for the Data Center Frontier Trends Summit. Before becoming Editor in Chief, he served in multiple senior editorial roles across Endeavor Business Media’s digital infrastructure portfolio, with coverage spanning data centers and hyperscale infrastructure, structured cabling and networking, telecom and datacom, IP physical security, and wireless and Pro AV markets. He began his career in 2005 within PennWell’s Advanced Technology Division and later held senior editorial positions supporting brands such as Cabling Installation & Maintenance, Lightwave Online, Broadband Technology Report, and Smart Buildings Technology. Vincent is a frequent moderator, interviewer, and keynote speaker at industry events including the HPC Forum, where he delivers forward-looking analysis on how AI and high-performance computing are reshaping digital infrastructure. He graduated with honors from Indiana University Bloomington with a B.A. in English Literature and Creative Writing and lives in southern New Hampshire with

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »