Stay Ahead, Stay ONMINE

LLM + RAG: Creating an AI-Powered File Reader Assistant

Introduction AI is everywhere.  It is hard not to interact at least once a day with a Large Language Model (LLM). The chatbots are here to stay. They’re in your apps, they help you write better, they compose emails, they read emails…well, they do a lot. And I don’t think that that is bad. In fact, my opinion is the other way – at least so far. I defend and advocate for the use of AI in our daily lives because, let’s agree, it makes everything much easier. I don’t have to spend time double-reading a document to find punctuation problems or type. AI does that for me. I don’t waste time writing that follow-up email every single Monday. AI does that for me. I don’t need to read a huge and boring contract when I have an AI to summarize the main takeaways and action points to me! These are only some of AI’s great uses. If you’d like to know more use cases of LLMs to make our lives easier, I wrote a whole book about them. Now, thinking as a data scientist and looking at the technical side, not everything is that bright and shiny.  LLMs are great for several general use cases that apply to anyone or any company. For example, coding, summarizing, or answering questions about general content created until the training cutoff date. However, when it comes to specific business applications, for a single purpose, or something new that didn’t make the cutoff date, that is when the models won’t be that useful if used out-of-the-box – meaning, they will not know the answer. Thus, it will need adjustments. Training an LLM model can take months and millions of dollars. What is even worse is that if we don’t adjust and tune the model to our purpose, there will be unsatisfactory results or hallucinations (when the model’s response doesn’t make sense given our query). So what is the solution, then? Spending a lot of money retraining the model to include our data? Not really. That’s when the Retrieval-Augmented Generation (RAG) becomes useful. RAG is a framework that combines getting information from an external knowledge base with large language models (LLMs). It helps AI models produce more accurate and relevant responses. Let’s learn more about RAG next. What is RAG? Let me tell you a story to illustrate the concept. I love movies. For some time in the past, I knew which movies were competing for the best movie category at the Oscars or the best actors and actresses. And I would certainly know which ones got the statue for that year. But now I am all rusty on that subject. If you asked me who was competing, I would not know. And even if I tried to answer you, I would give you a weak response.  So, to provide you with a quality response, I will do what everybody else does: search for the information online, obtain it, and then give it to you. What I just did is the same idea as the RAG: I obtained data from an external database to give you an answer. When we enhance the LLM with a content store where it can go and retrieve data to augment (increase) its knowledge base, that is the RAG framework in action. RAG is like creating a content store where the model can enhance its knowledge and respond more accurately. User prompt about Content C. LLM retrieves external content to aggregate to the answer. Image by the author. Summarizing: Uses search algorithms to query external data sources, such as databases, knowledge bases, and web pages. Pre-processes the retrieved information. Incorporates the pre-processed information into the LLM. Why use RAG? Now that we know what the RAG framework is let’s understand why we should be using it. Here are some of the benefits: Enhances factual accuracy by referencing real data. RAG can help LLMs process and consolidate knowledge to create more relevant answers  RAG can help LLMs access additional knowledge bases, such as internal organizational data  RAG can help LLMs create more accurate domain-specific content  RAG can help reduce knowledge gaps and AI hallucination As previously explained, I like to say that with the RAG framework, we are giving an internal search engine for the content we want it to add to the knowledge base. Well. All of that is very interesting. But let’s see an application of RAG. We will learn how to create an AI-powered PDF Reader Assistant. Project This is an application that allows users to upload a PDF document and ask questions about its content using AI-powered natural language processing (NLP) tools.  The app uses Streamlit as the front end. Langchain, OpenAI’s GPT-4 model, and FAISS (Facebook AI Similarity Search) for document retrieval and question answering in the backend. Let’s break down the steps for better understanding: Loading a PDF file and splitting it into chunks of text. This makes the data optimized for retrieval Present the chunks to an embedding tool. Embeddings are numerical vector representations of data used to capture relationships, similarities, and meanings in a way that machines can understand. They are widely used in Natural Language Processing (NLP), recommender systems, and search engines. Next, we put those chunks of text and embeddings in the same DB for retrieval. Finally, we make it available to the LLM. Data preparation Preparing a content store for the LLM will take some steps, as we just saw. So, let’s start by creating a function that can load a file and split it into text chunks for efficient retrieval. # Imports from langchain_community.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter def load_document(pdf): # Load a PDF “”” Load a PDF and split it into chunks for efficient retrieval. :param pdf: PDF file to load :return: List of chunks of text “”” loader = PyPDFLoader(pdf) docs = loader.load() # Instantiate Text Splitter with Chunk Size of 500 words and Overlap of 100 words so that context is not lost text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100) # Split into chunks for efficient retrieval chunks = text_splitter.split_documents(docs) # Return return chunks Next, we will start building our Streamlit app, and we’ll use that function in the next script. Web application We will begin importing the necessary modules in Python. Most of those will come from the langchain packages. FAISS is used for document retrieval; OpenAIEmbeddings transforms the text chunks into numerical scores for better similarity calculation by the LLM; ChatOpenAI is what enables us to interact with the OpenAI API; create_retrieval_chain is what actually the RAG does, retrieving and augmenting the LLM with that data; create_stuff_documents_chain glues the model and the ChatPromptTemplate. Note: You will need to generate an OpenAI Key to be able to run this script. If it’s the first time you’re creating your account, you get some free credits. But if you have it for some time, it is possible that you will have to add 5 dollars in credits to be able to access OpenAI’s API. An option is using Hugging Face’s Embedding.  # Imports from langchain_community.vectorstores import FAISS from langchain_openai import OpenAIEmbeddings from langchain.chains import create_retrieval_chain from langchain_openai import ChatOpenAI from langchain.chains.combine_documents import create_stuff_documents_chain from langchain_core.prompts import ChatPromptTemplate from scripts.secret import OPENAI_KEY from scripts.document_loader import load_document import streamlit as st This first code snippet will create the App title, create a box for file upload, and prepare the file to be added to the load_document() function. # Create a Streamlit app st.title(“AI-Powered Document Q&A”) # Load document to streamlit uploaded_file = st.file_uploader(“Upload a PDF file”, type=”pdf”) # If a file is uploaded, create the TextSplitter and vector database if uploaded_file :     # Code to work around document loader from Streamlit and make it readable by langchain     temp_file = “./temp.pdf”     with open(temp_file, “wb”) as file:         file.write(uploaded_file.getvalue())         file_name = uploaded_file.name     # Load document and split it into chunks for efficient retrieval.     chunks = load_document(temp_file)     # Message user that document is being processed with time emoji     st.write(“Processing document… :watch:”) Machines understand numbers better than text, so in the end, we will have to provide the model with a database of numbers that it can compare and check for similarity when performing a query. That’s where the embeddings will be useful to create the vector_db, in this next piece of code. # Generate embeddings     # Embeddings are numerical vector representations of data, typically used to capture relationships, similarities,     # and meanings in a way that machines can understand. They are widely used in Natural Language Processing (NLP),     # recommender systems, and search engines.     embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_KEY,                                   model=”text-embedding-ada-002″)     # Can also use HuggingFaceEmbeddings     # from langchain_huggingface.embeddings import HuggingFaceEmbeddings     # embeddings = HuggingFaceEmbeddings(model_name=”sentence-transformers/all-MiniLM-L6-v2″)     # Create vector database containing chunks and embeddings     vector_db = FAISS.from_documents(chunks, embeddings) Next, we create a retriever object to navigate in the vector_db. # Create a document retriever     retriever = vector_db.as_retriever()     llm = ChatOpenAI(model_name=”gpt-4o-mini”, openai_api_key=OPENAI_KEY) Then, we will create the system_prompt, which is a set of instructions to the LLM on how to answer, and we will create a prompt template, preparing it to be added to the model once we get the input from the user. # Create a system prompt     # It sets the overall context for the model.     # It influences tone, style, and focus before user interaction starts.     # Unlike user inputs, a system prompt is not visible to the end user.     system_prompt = (         “You are a helpful assistant. Use the given context to answer the question.”         “If you don’t know the answer, say you don’t know. ”         “{context}”     )     # Create a prompt Template     prompt = ChatPromptTemplate.from_messages(         [             (“system”, system_prompt),             (“human”, “{input}”),         ]     )     # Create a chain     # It creates a StuffDocumentsChain, which takes multiple documents (text data) and “stuffs” them together before passing them to the LLM for processing.     question_answer_chain = create_stuff_documents_chain(llm, prompt) Moving on, we create the core of the RAG framework, pasting together the retriever object and the prompt. This object adds relevant documents from a data source (e.g., a vector database) and makes it ready to be processed using an LLM to generate a response. # Creates the RAG      chain = create_retrieval_chain(retriever, question_answer_chain) Finally, we create the variable question for the user input. If this question box is filled with a query, we pass it to the chain, which calls the LLM to process and return the response, which will be printed on the app’s screen. # Streamlit input for question     question = st.text_input(“Ask a question about the document:”)     if question:         # Answer         response = chain.invoke({“input”: question})[‘answer’]         st.write(response) Here is a screenshot of the result. Screenshot of the final app. Image by the author. And this is a GIF for you to see the File Reader Ai Assistant in action! File Reader AI Assistant in action. Image by the author. Before you go In this project, we learned what the RAG framework is and how it helps the Llm to perform better and also perform well with specific knowledge. AI can be powered with knowledge from an instruction manual, databases from a company, some finance files, or contracts, and then become fine-tuned to respond accurately to domain-specific content queries. The knowledge base is augmented with a content store. To recap, this is how the framework works: 1️⃣ User Query → Input text is received. 2️⃣ Retrieve Relevant Documents → Searches a knowledge base (e.g., a database, vector store). 3️⃣ Augment Context → Retrieved documents are added to the input. 4️⃣ Generate Response → An LLM processes the combined input and produces an answer. GitHub repository https://github.com/gurezende/Basic-Rag About me If you liked this content and want to learn more about my work, here is my website, where you can also find all my contacts. https://gustavorsantos.me References https://cloud.google.com/use-cases/retrieval-augmented-generation https://www.ibm.com/think/topics/retrieval-augmented-generation https://python.langchain.com/docs/introduction https://www.geeksforgeeks.org/how-to-get-your-own-openai-api-key

Introduction

AI is everywhere. 

It is hard not to interact at least once a day with a Large Language Model (LLM). The chatbots are here to stay. They’re in your apps, they help you write better, they compose emails, they read emails…well, they do a lot.

And I don’t think that that is bad. In fact, my opinion is the other way – at least so far. I defend and advocate for the use of AI in our daily lives because, let’s agree, it makes everything much easier.

I don’t have to spend time double-reading a document to find punctuation problems or type. AI does that for me. I don’t waste time writing that follow-up email every single Monday. AI does that for me. I don’t need to read a huge and boring contract when I have an AI to summarize the main takeaways and action points to me!

These are only some of AI’s great uses. If you’d like to know more use cases of LLMs to make our lives easier, I wrote a whole book about them.

Now, thinking as a data scientist and looking at the technical side, not everything is that bright and shiny. 

LLMs are great for several general use cases that apply to anyone or any company. For example, coding, summarizing, or answering questions about general content created until the training cutoff date. However, when it comes to specific business applications, for a single purpose, or something new that didn’t make the cutoff date, that is when the models won’t be that useful if used out-of-the-box – meaning, they will not know the answer. Thus, it will need adjustments.

Training an LLM model can take months and millions of dollars. What is even worse is that if we don’t adjust and tune the model to our purpose, there will be unsatisfactory results or hallucinations (when the model’s response doesn’t make sense given our query).

So what is the solution, then? Spending a lot of money retraining the model to include our data?

Not really. That’s when the Retrieval-Augmented Generation (RAG) becomes useful.

RAG is a framework that combines getting information from an external knowledge base with large language models (LLMs). It helps AI models produce more accurate and relevant responses.

Let’s learn more about RAG next.

What is RAG?

Let me tell you a story to illustrate the concept.

I love movies. For some time in the past, I knew which movies were competing for the best movie category at the Oscars or the best actors and actresses. And I would certainly know which ones got the statue for that year. But now I am all rusty on that subject. If you asked me who was competing, I would not know. And even if I tried to answer you, I would give you a weak response. 

So, to provide you with a quality response, I will do what everybody else does: search for the information online, obtain it, and then give it to you. What I just did is the same idea as the RAG: I obtained data from an external database to give you an answer.

When we enhance the LLM with a content store where it can go and retrieve data to augment (increase) its knowledge base, that is the RAG framework in action.

RAG is like creating a content store where the model can enhance its knowledge and respond more accurately.

Diagram: User prompts and content using LLM + RAG
User prompt about Content C. LLM retrieves external content to aggregate to the answer. Image by the author.

Summarizing:

  1. Uses search algorithms to query external data sources, such as databases, knowledge bases, and web pages.
  2. Pre-processes the retrieved information.
  3. Incorporates the pre-processed information into the LLM.

Why use RAG?

Now that we know what the RAG framework is let’s understand why we should be using it.

Here are some of the benefits:

  • Enhances factual accuracy by referencing real data.
  • RAG can help LLMs process and consolidate knowledge to create more relevant answers 
  • RAG can help LLMs access additional knowledge bases, such as internal organizational data 
  • RAG can help LLMs create more accurate domain-specific content 
  • RAG can help reduce knowledge gaps and AI hallucination

As previously explained, I like to say that with the RAG framework, we are giving an internal search engine for the content we want it to add to the knowledge base.

Well. All of that is very interesting. But let’s see an application of RAG. We will learn how to create an AI-powered PDF Reader Assistant.

Project

This is an application that allows users to upload a PDF document and ask questions about its content using AI-powered natural language processing (NLP) tools. 

  • The app uses Streamlit as the front end.
  • Langchain, OpenAI’s GPT-4 model, and FAISS (Facebook AI Similarity Search) for document retrieval and question answering in the backend.

Let’s break down the steps for better understanding:

  1. Loading a PDF file and splitting it into chunks of text.
    1. This makes the data optimized for retrieval
  2. Present the chunks to an embedding tool.
    1. Embeddings are numerical vector representations of data used to capture relationships, similarities, and meanings in a way that machines can understand. They are widely used in Natural Language Processing (NLP), recommender systems, and search engines.
  3. Next, we put those chunks of text and embeddings in the same DB for retrieval.
  4. Finally, we make it available to the LLM.

Data preparation

Preparing a content store for the LLM will take some steps, as we just saw. So, let’s start by creating a function that can load a file and split it into text chunks for efficient retrieval.

# Imports
from  langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_document(pdf):
    # Load a PDF
    """
    Load a PDF and split it into chunks for efficient retrieval.

    :param pdf: PDF file to load
    :return: List of chunks of text
    """

    loader = PyPDFLoader(pdf)
    docs = loader.load()

    # Instantiate Text Splitter with Chunk Size of 500 words and Overlap of 100 words so that context is not lost
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
    # Split into chunks for efficient retrieval
    chunks = text_splitter.split_documents(docs)

    # Return
    return chunks

Next, we will start building our Streamlit app, and we’ll use that function in the next script.

Web application

We will begin importing the necessary modules in Python. Most of those will come from the langchain packages.

FAISS is used for document retrieval; OpenAIEmbeddings transforms the text chunks into numerical scores for better similarity calculation by the LLM; ChatOpenAI is what enables us to interact with the OpenAI API; create_retrieval_chain is what actually the RAG does, retrieving and augmenting the LLM with that data; create_stuff_documents_chain glues the model and the ChatPromptTemplate.

Note: You will need to generate an OpenAI Key to be able to run this script. If it’s the first time you’re creating your account, you get some free credits. But if you have it for some time, it is possible that you will have to add 5 dollars in credits to be able to access OpenAI’s API. An option is using Hugging Face’s Embedding. 

# Imports
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.chains import create_retrieval_chain
from langchain_openai import ChatOpenAI
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from scripts.secret import OPENAI_KEY
from scripts.document_loader import load_document
import streamlit as st

This first code snippet will create the App title, create a box for file upload, and prepare the file to be added to the load_document() function.

# Create a Streamlit app
st.title("AI-Powered Document Q&A")

# Load document to streamlit
uploaded_file = st.file_uploader("Upload a PDF file", type="pdf")

# If a file is uploaded, create the TextSplitter and vector database
if uploaded_file :

    # Code to work around document loader from Streamlit and make it readable by langchain
    temp_file = "./temp.pdf"
    with open(temp_file, "wb") as file:
        file.write(uploaded_file.getvalue())
        file_name = uploaded_file.name

    # Load document and split it into chunks for efficient retrieval.
    chunks = load_document(temp_file)

    # Message user that document is being processed with time emoji
    st.write("Processing document... :watch:")

Machines understand numbers better than text, so in the end, we will have to provide the model with a database of numbers that it can compare and check for similarity when performing a query. That’s where the embeddings will be useful to create the vector_db, in this next piece of code.

# Generate embeddings
    # Embeddings are numerical vector representations of data, typically used to capture relationships, similarities,
    # and meanings in a way that machines can understand. They are widely used in Natural Language Processing (NLP),
    # recommender systems, and search engines.
    embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_KEY,
                                  model="text-embedding-ada-002")

    # Can also use HuggingFaceEmbeddings
    # from langchain_huggingface.embeddings import HuggingFaceEmbeddings
    # embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

    # Create vector database containing chunks and embeddings
    vector_db = FAISS.from_documents(chunks, embeddings)

Next, we create a retriever object to navigate in the vector_db.

# Create a document retriever
    retriever = vector_db.as_retriever()
    llm = ChatOpenAI(model_name="gpt-4o-mini", openai_api_key=OPENAI_KEY)

Then, we will create the system_prompt, which is a set of instructions to the LLM on how to answer, and we will create a prompt template, preparing it to be added to the model once we get the input from the user.

# Create a system prompt
    # It sets the overall context for the model.
    # It influences tone, style, and focus before user interaction starts.
    # Unlike user inputs, a system prompt is not visible to the end user.

    system_prompt = (
        "You are a helpful assistant. Use the given context to answer the question."
        "If you don't know the answer, say you don't know. "
        "{context}"
    )

    # Create a prompt Template
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", "{input}"),
        ]
    )

    # Create a chain
    # It creates a StuffDocumentsChain, which takes multiple documents (text data) and "stuffs" them together before passing them to the LLM for processing.

    question_answer_chain = create_stuff_documents_chain(llm, prompt)

Moving on, we create the core of the RAG framework, pasting together the retriever object and the prompt. This object adds relevant documents from a data source (e.g., a vector database) and makes it ready to be processed using an LLM to generate a response.

# Creates the RAG
     chain = create_retrieval_chain(retriever, question_answer_chain)

Finally, we create the variable question for the user input. If this question box is filled with a query, we pass it to the chain, which calls the LLM to process and return the response, which will be printed on the app’s screen.

# Streamlit input for question
    question = st.text_input("Ask a question about the document:")
    if question:
        # Answer
        response = chain.invoke({"input": question})['answer']
        st.write(response)

Here is a screenshot of the result.

Screenshot of the AI-Powered Document Q&A
Screenshot of the final app. Image by the author.

And this is a GIF for you to see the File Reader Ai Assistant in action!

GIF of the File Reader AI Assistant in action
File Reader AI Assistant in action. Image by the author.

Before you go

In this project, we learned what the RAG framework is and how it helps the Llm to perform better and also perform well with specific knowledge.

AI can be powered with knowledge from an instruction manual, databases from a company, some finance files, or contracts, and then become fine-tuned to respond accurately to domain-specific content queries. The knowledge base is augmented with a content store.

To recap, this is how the framework works:

1️⃣ User Query → Input text is received.

2️⃣ Retrieve Relevant Documents → Searches a knowledge base (e.g., a database, vector store).

3️⃣ Augment Context → Retrieved documents are added to the input.

4️⃣ Generate Response → An LLM processes the combined input and produces an answer.

GitHub repository

https://github.com/gurezende/Basic-Rag

About me

If you liked this content and want to learn more about my work, here is my website, where you can also find all my contacts.

https://gustavorsantos.me

References

https://cloud.google.com/use-cases/retrieval-augmented-generation

https://www.ibm.com/think/topics/retrieval-augmented-generation

https://youtu.be/T-D1OfcDW1M?si=G0UWfH5-wZnMu0nw

https://python.langchain.com/docs/introduction

https://www.geeksforgeeks.org/how-to-get-your-own-openai-api-key

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

ExxonMobil bumps up 2030 target for Permian production

ExxonMobil Corp., Houston, is looking to grow production in the Permian basin to about 2.5 MMboe/d by 2030, an increase of 200,000 boe/d from executives’ previous forecasts and a jump of more than 45% from this year’s output. Helping drive that higher target is an expected 2030 cost profile that

Read More »

US Oil Slides to Four Year Low

The US oil benchmark fell to its lowest level since February 2021, with traders weighing renewed signs of optimism surrounding a deal to end the war in Ukraine and mixed economic data from China. West Texas Intermediate settled below $57 a barrel in thin trading ahead of the Christmas and New Year holidays, sliding as stocks wavered. US negotiators offered more substantial security guarantees to Kyiv in a renewed bid to clinch a deal, though the effort still appeared part of a bid to pressure Ukrainian President Volodymyr Zelenskiy on territory. An agreement to end the conflict could lift restrictions on the flows of Russian oil, limiting disruptions in an already well-supplied market. The potentially positive developments in the talks added to earlier bearish momentum on signs of weakness in China’s economy that could limit a key source of demand for crude, outweighing news that the country’s apparent oil demand and refining activity increased in November. Oil is set for an annual loss, with supply set to exceed demand this year and next. Concerns about a glut are showing up in the key Middle Eastern crude market, and trend-following commodity trading advisers were 100% short in both Brent and WTI on Monday, according to data from Bridgeton Research Group. “Crude continues to trade heavy as headlines this morning suggest there’s growing consensus around elements of a potential Russia-Ukraine ceasefire,” said Rebecca Babin, a senior energy trader at CIBC Private Wealth Group. “While a ceasefire wouldn’t trigger a sudden wave of Russian barrels returning to market, it would materially reduce the risk of future supply disruptions.” Still, the fact that some details of a peace deal remain unclear could offer support for prices, Babin said. And there are other geopolitical inputs at play. Even as US-Ukraine talks advanced, Ukraine has intensified

Read More »

Tokyo Gas to Invest in USA Downstream Assets

Tokyo Gas Co., Japan’s biggest distributor of the fuel, plans to invest in US downstream assets to lift earnings and reinforce the last leg of its energy supply chain. The company is looking to deploy capital in assets like liquefaction plants, export terminals and the energy services sector, said Tokyo Gas President Shinichi Sasayama. “We’ve already made investments in midstream, downstream areas such as marketing and trading, and we intend to raise profitability,” he said in an interview.  Tokyo Gas shares rose as much as 2.3% during Monday morning trading hours, while the broader Topix index fell as much as 0.4%.  The firm’s planned expansion in the US comes as President Donald Trump rolls back climate commitments and elevates fossil fuels in national security planning. A surge in power use from artificial intelligence and data centers is boosting demand for gas-fired electricity, creating favorable conditions for energy producers.  Tokyo Gas has allocated 350 billion yen ($2.2 billion) for overseas investments for the next three years starting from fiscal 2026, according to a strategy document released in October. However, a spokesperson declined to say on Friday how much the company has earmarked for downstream expansion in the US. Sasayama said much of that money will go toward developing and making the company’s shale gas assets profitable. Any decision to increase spending on upstream assets will depend on circumstances at the time, he added. Tokyo Gas’ US subsidiary bought Rockcliff Energy II LLC, a Texas natural-gas producer, in late 2023 for about $2.7 billion. It also acquired a stake in gas marketing and trading firm Arm Energy Trading LLC in 2024.  The Japanese utility drew attention last year after activist Elliott Investment Management disclosed a 5% stake. Elliott initially pressured Tokyo Gas to divest parts of its multibillion dollar real estate portfolio and boost shareholder value.  Sasayama said the

Read More »

Finnish Refiner Says It Won’t Meet 2035 Oil Exit Goal

Finland’s Neste Oyj is scaling back some of its climate targets and doesn’t expect to stop using crude at its only oil refinery by 2035 as previously guided.  Neste had pledged a “very ambitious” target for achieve carbon neutral production by that year, but it now expects to reduce greenhouse gas emissions in its own operations — known as Scope 1 and 2 — by 80% by 2040, according to a statement on its website. Neste’s share rose.  Reaching the original climate targets and schedule “would have required significant investments that are currently not realistic,” the company said. “The timeline for transitioning from crude oil to processing renewable and circular raw materials will be determined in line with the actual fuel market demand.” The bulk of Neste’s climate goals were put in place under the company’s previous chief executive officer. Heikki Malinen, who took over in 2024, had already said that a target on using waste plastics in the Porvoo refinery had been put on hold.  Neste’s share advanced as much as 3.9% to €19.03 as of 12:36 p.m. local time. Neste is set to benefit from decisions taken last week by the German government that are expected to boost demand for renewable diesel, one of its main products.  WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed.

Read More »

Tribal nations regroup after loss of federal funding for clean energy

Tribal nations looking to build clean energy projects are exploring new funding pathways after the Trump administration’s cuts to clean energy grants like Solar for All, which earmarked more than $500 million for solar development on tribal lands.  The Tribal Renewable Energy Coalition, an organization of 14 tribes which received a grant for more than $135 million in now-cancelled Solar for All funding, is continuing to use the solar project plans it developed but will seek funding sources like loans and philanthropy, Indigenized Energy CEO Cody Two Bears said during a November press call. “I think that’s the positive side of what we get out of Solar for All, is we had over a year to plan this out, and now we have some things teed up for tribes to go after and start to leverage some of these dollars,” Two Bears said. “And I think that’s the plan for what we’re going to do moving forward.” After the Trump administration took office and began to cancel grants associated with the Inflation Reduction Act, “the large utility scale projects that we were working on suddenly became a lot more difficult to finance,” said Chéri Smith, CEO of the Alliance for Tribal Clean Energy. “And yet, there are still opportunities there.” Earlier this year, the alliance launched a suite of project finance and planning tools called the Weaver Platform, and is helping tribes “to use that tool to build their capital stacks,” at the same time that they’re “engaging with networks of investors of all flavors to try and take projects and fund them with private dollars instead of public,” Smith said. The Trump administration’s grant clawback has rescinded a great deal of federal funding for clean energy from tribes, Smith said, noting that the alliance’s grant-writing team got $484 million

Read More »

Texas Oil, Gas Upstream Jobs Fall in September

Texas oil and gas upstream jobs fell by 1,300 in September, industry body the Texas Oil & Gas Association (TXOGA) said in a statement sent to Rigzone recently. “The Texas Workforce Commission, having skipped data releases during the federal shutdown that included the Bureau of Labor Statistics, has resumed job data publication,” TXOGA noted in the statement. “Today [December 11] the Commission released September 2025 data, indicating that upstream oil and gas employment fell by 1,300 in September compared to August,” it added. In the statement, TXOGA noted that, “despite recent flat performance”, growth for this year through September “remains a positive 3,900 upstream jobs”. “At 204,800 upstream jobs, compared to the same month in the prior year, September 2025 jobs were up by 1,900, or 0.9 percent,” TXOGA highlighted. A chart included in the statement, which displayed Texas oil and gas upstream job figures from January 2021 to September 2025, showed that, despite the monthly dip from August to September 2025, these job figures in September still stood well above the figures in January 2021.  “The recent downward cycle of the upstream job count confirms Texas is not immune to circumstances facing global oil markets,” TXOGA President Todd Staples warned in the statement. “As a major oil exporter for the United States, the Lone Star State must remain competitive on the worldwide stage,” he added. “To remain the global leader, our industry depends on Texas legislative, regulatory, and business climate certainty that is favorable to investment and job creation even when supply and demand factors present uncertainty and instability,” he continued. The White House website highlights that the U.S. government was recently shut down “for a record 43 days”. August Figures, 2025 Analysis In a statement posted on its site back in September, TXOGA stated that data from the

Read More »

Strategists Say Oil’s Fermi Paradox Nearing an End

In an oil and gas report sent to Rigzone recently by the Macquarie team, Macquarie strategists, including Vikas Dwivedi, noted that oil’s “Fermi Paradox [is] nearing an end”, adding that “onshore stocks [are] starting to build”. “We continue to expect a heavily oversupplied market,” the strategists said in the report. “We estimate a 1Q26 peak supply-demand surplus of over four million barrels per day. Signs of the surplus are showing with continued offshore builds, increasing onshore builds, and extremely strong freight rates,” they added. “We estimate that approximately one-third of the offshore build is long-haul shipments from the Americas to Asia,” they continued. In the report, the strategists revealed that they expect onshore builds to accelerate through year-end 2025 and into early 2026, a process which they said “should drive Brent towards the low $50 range, with a possibility of reaching $45 per barrel”. “Since the end of August, offshore inventories have increased by roughly 250 million barrels and onshore storage up by ~30 million barrels,” the strategists highlighted in the report. “In the past month, the trend has accelerated with onshore … [plus] offshore stocks building by ~ three million barrels per day. Yet, structure remains backwardated, as AB barrels continued clearing East,” they added. A separate report sent to Rigzone by the Macquarie team on December 5 showed that Macquarie was projecting that the Brent price will average $68.21 per barrel overall in 2025 and $60.75 per barrel overall in 2026. According to that report, Macquarie expects the Brent price to average $63.00 per barrel in the fourth quarter of this year, $57.00 per barrel in the first quarter of 2026, $59.00 per barrel in the second quarter, $60.00 per barrel in the third quarter, and $67.00 per barrel in the fourth quarter.   In that report, Macquarie

Read More »

Executive Roundtable: Converging Disciplines in the AI Buildout

At Data Center Frontier, we rely on industry leaders to help us understand the most urgent challenges facing digital infrastructure. And in the fourth quarter of 2025, the data center industry is adjusting to a new kind of complexity.  AI-scale infrastructure is redefining what “mission critical” means, from megawatt density and modular delivery to the chemistry of cooling fluids and the automation of energy systems. Every project has arguably in effect now become an ecosystem challenge, demanding that electrical, mechanical, construction, and environmental disciplines act as one.  For this quarter’s Executive Roundtable, DCF convened subject matter experts from Ecolab, EdgeConneX, Rehlko and Schneider Electric – leaders spanning the full chain of facilities design, deployment, and operation. Their insights illuminate how liquid cooling, energy management, and sustainable process design in data centers are now converging to set the pace for the AI era. Our distinguished executive panelists for this quarter include: Rob Lowe, Director RD&E – Global High Tech, Ecolab Phillip Marangella, Chief Marketing and Product Officer, EdgeConneX Ben Rapp, Manager, Strategic Project Development, Rehlko Joe Reele, Vice President, Datacenter Solution Architects, Schneider Electric Today: Engineering the New Normal – Liquid Cooling at Scale  Today’s kickoff article grapples with how, as liquid cooling technology transitions to default hyperscale design, the challenge is no longer if, but how to scale builds safely, repeatably, and globally.  Cold plates, immersion, dielectric fluids, and liquid-to-chip loops are converging into factory-integrated building blocks, yet variability in chemistry, serviceability, materials, commissioning practices, and long-term maintenance threatens to fragment adoption just as demand accelerates.  Success now hinges on shared standards and tighter collaboration across OEMs, builders, and process specialists worldwide. So how do developers coordinate across the ecosystem to make liquid cooling a safe, maintainable global default? What’s Ahead in the Roundtable Over the coming days, our panel

Read More »

DCF Trends Summit 2025: AI for Good – How Operators, Vendors and Cooling Specialists See the Next Phase of AI Data Centers

At the 2025 Data Center Frontier Trends Summit (Aug. 26-28) in Reston, Va., the conversation around AI and infrastructure moved well past the hype. In a panel sponsored by Schneider Electric—“AI for Good: Building for AI Workloads and Using AI for Smarter Data Centers”—three industry leaders explored what it really means to design, cool and operate the new class of AI “factories,” while also turning AI inward to run those facilities more intelligently. Moderated by Data Center Frontier Editor in Chief Matt Vincent, the session brought together: Steve Carlini, VP, Innovation and Data Center Energy Management Business, Schneider Electric Sudhir Kalra, Chief Data Center Operations Officer, Compass Datacenters Andrew Whitmore, VP of Sales, Motivair Together, they traced both sides of the “AI for Good” equation: building for AI workloads at densities that would have sounded impossible just a few years ago, and using AI itself to reduce risk, improve efficiency and minimize environmental impact. From Bubble Talk to “AI Factories” Carlini opened by acknowledging the volatility surrounding AI investments, citing recent headlines and even Sam Altman’s public use of the word “bubble” to describe the current phase of exuberance. “It’s moving at an incredible pace,” Carlini noted, pointing out that roughly half of all VC money this year has flowed into AI, with more already spent than in all of the previous year. Not every investor will win, he said, and some companies pouring in hundreds of billions may not recoup their capital. But for infrastructure, the signal is clear: the trajectory is up and to the right. GPU generations are cycling faster than ever. Densities are climbing from high double-digits per rack toward hundreds of kilowatts. The hyperscale “AI factories,” as NVIDIA calls them, are scaling to campus capacities measured in gigawatts. Carlini reminded the audience that in 2024,

Read More »

FinOps Foundation sharpens FOCUS to reduce cloud cost chaos

“The big change that’s really started to happen in late 2024 early 2025 is that the FinOps practice started to expand past the cloud,” Storment said. “A lot of organizations got really good at using FinOps to manage the value of cloud, and then their organizations went, ‘oh, hey, we’re living in this happily hybrid state now where we’ve got cloud, SaaS, data center. Can you also apply the FinOps practice to our SaaS? Or can you apply it to our Snowflake? Can you apply it to our data center?’” The FinOps Foundation’s community has grown to approximately 100,000 practitioners. The organization now includes major cloud vendors, hardware providers like Nvidia and AMD, data center operators and data cloud platforms like Snowflake and Databricks. Some 96 of the Fortune 100 now participate in FinOps Foundation programs. The practice itself has shifted in two directions. It has moved left into earlier architectural and design processes, becoming more proactive rather than reactive. It has also moved up organizationally, from director-level cloud management roles to SVP and COO positions managing converged technology portfolios spanning multiple infrastructure types. This expansion has driven the evolution of FOCUS beyond its original cloud billing focus. Enterprises are implementing FOCUS as an internal standard for chargeback reporting even when their providers don’t generate native FOCUS data. Some newer cloud providers, particularly those focused on AI infrastructure, are using the FOCUS specification to define their billing data structures from the ground up rather than retrofitting existing systems. The FOCUS 1.3 release reflects this maturation, addressing technical gaps that have emerged as organizations apply cost management practices across increasingly complex hybrid environments. FOCUS 1.3 exposes cost allocation logic for shared infrastructure The most significant technical enhancement in FOCUS 1.3 addresses a gap in how shared infrastructure costs are allocated and

Read More »

Aetherflux joins the race to launch orbital data centers by 2027

Enterprises will connect to and manage orbital workloads “the same way they manage cloud workloads today,” using optical links, the spokesperson added. The company’s approach is to “continuously launch new hardware and quickly integrate the latest architectures,” with older systems running lower-priority tasks to serve out the full useful lifetime of their high-end GPUs. The company declined to disclose pricing. Aetherflux plans to launch about 30 satellites at a time on SpaceX Falcon 9 rockets. Before the data center launch, the company will launch a power-beaming demonstration satellite in 2026 to test transmission of one kilowatt of energy from orbit to ground stations, using infrared lasers. Competition in the sector has intensified in recent months. In November, Starcloud launched its Starcloud-1 satellite carrying an Nvidia H100 GPU, which is 100 times more powerful than any previous GPU flown in space, according to the company, and demonstrated running Google’s Gemma AI model in orbit. In the same month, Google announced Project Suncatcher, with a 2027 demonstration mission planned. Analysts see limited near-term applications Despite the competitive activity, orbital data centers won’t replace terrestrial cloud regions for general hosting through 2030, said Ashish Banerjee, senior principal analyst at Gartner. Instead, they suit specific workloads, including meeting data sovereignty requirements for jurisdictionally complex scenarios, offering disaster recovery immune to terrestrial risks, and providing asynchronous high-performance computing, he said. “Orbital centers are ideal for high-compute, low-I/O batch jobs,” Banerjee said. “Think molecular folding simulations for pharma, massive Monte Carlo financial simulations, or training specific AI model weights. If the job takes 48 hours, the 500ms latency penalty of LEO is irrelevant.” One immediate application involves processing satellite-generated data in orbit, he said. Earth observation satellites using synthetic aperture radar generate roughly 10 gigabytes per second, but limited downlink bandwidth creates bottlenecks. Processing data in

Read More »

Here’s what Oracle’s soaring infrastructure spend could mean for enterprises

He said he had earlier told analysts in a separate call that margins for AI workloads in these data centers would be in the 30% to 40% range over the life of a customer contract. Kehring reassured that there would be demand for the data centers when they were completed, pointing to Oracle’s increasing remaining performance obligations, or services contracted but not yet delivered, up $68 billion on the previous quarter, saying that Oracle has been seeing unprecedented demand for AI workloads driven by the likes of Meta and Nvidia. Rising debt and margin risks raise flags for CIOs For analysts, though, the swelling debt load is hard to dismiss, even with Oracle’s attempts to de-risk its spend and squeeze more efficiency out of its buildouts. Gogia sees Oracle already under pressure, with the financial ecosystem around the company pricing the risk — one of the largest debts in corporate history, crossing $100 billion even before the capex spend this quarter — evident in the rising cost of insuring the debt and the shift in credit outlook. “The combination of heavy capex, negative free cash flow, increasing financing cost and long-dated revenue commitments forms a structural pressure that will invariably finds its way into the commercial posture of the vendor,” Gogia said, hinting at an “eventual” increase in pricing of the company’s offerings. He was equally unconvinced by Magouyrk’s assurances about the margin profile of AI workloads as he believes that AI infrastructure, particularly GPU-heavy clusters, delivers significantly lower margins in the early years because utilisation takes time to ramp.

Read More »

New Nvidia software gives data centers deeper visibility into GPU thermals and reliability

Addressing the challenge Modern AI accelerators now draw more than 700W per GPU, and multi-GPU nodes can reach 6kW, creating concentrated heat zones, rapid power swings, and a higher risk of interconnect degradation in dense racks, according to Manish Rawat, semiconductor analyst at TechInsights. Traditional cooling methods and static power planning increasingly struggle to keep pace with these loads. “Rich vendor telemetry covering real-time power draw, bandwidth behavior, interconnect health, and airflow patterns shifts operators from reactive monitoring to proactive design,” Rawat said. “It enables thermally aware workload placement, faster adoption of liquid or hybrid cooling, and smarter network layouts that reduce heat-dense traffic clusters.” Rawat added that the software’s fleet-level configuration insights can also help operators catch silent errors caused by mismatched firmware or driver versions. This can improve training reproducibility and strengthen overall fleet stability. “Real-time error and interconnect health data also significantly accelerates root-cause analysis, reducing MTTR and minimizing cluster fragmentation,” Rawat said. These operational pressures can shape budget decisions and infrastructure strategy at the enterprise level.

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »