How to Measure the Reliability of a Large Language Model’s Response

Stay Ahead, Stay ONMINE

How to Measure the Reliability of a Large Language Model’s Response

The basic principle of Large Language Models (LLMs) is very simple: to predict the next word (or token) in a sequence of words based on statistical patterns in their training data. However, this seemingly simple capability turns out to be incredibly sophisticated when it can do a number of amazing tasks such as text summarization, idea generation, brainstorming, code generation, information processing, and content creation. That said, LLMs do not have any memory no do they actually “understand” anything, other than sticking to their basic function: predicting the next word.

The process of next-word prediction is probabilistic. The LLM has to select each word from a probability distribution. In the process, they often generate false, fabricated, or inconsistent content in an attempt to produce coherent responses and fill in gaps with plausible-looking but incorrect information. This phenomenon is called hallucination, an inevitable, well-known feature of LLMs that warrants validation and corroboration of their outputs.

Retrieval augment generation (RAG) methods, which make an LLM work with external knowledge sources, do minimize hallucinations to some extent, but they cannot completely eradicate them. Although advanced RAGs can provide in-text citations and URLs, verifying these references could be hectic and time-consuming. Therefore, we need an objective criterion for assessing the reliability or trustworthiness of an LLM’s response, whether it is generated from its own knowledge or an external knowledge base (RAG).

In this article, we will discuss how the output of an LLM can be assessed for trustworthiness by a trustworthy language model which assigns a score to the LLM’s output. We will first discuss how we can use a trustworthy language model to assign scores to an LLM’s answer and explain trustworthiness. Subsequently, we will develop an example RAG with LlamaParse and Llamaindex that assesses the RAG’s answers for trustworthiness.

The entire code of this article is available in the jupyter notebook on GitHub.

Assigning a Trustworthiness Score to an LLM’s Answer

To demonstrate how we can assign a trustworthiness score to an Llm’s response, I will use Cleanlab’s Trustworthy Language Model (TLM). Such TLMs use a combination of uncertainty quantification and consistency analysis to compute trustworthiness scores and explanations for LLM responses.

Cleanlab offers free trial APIs which can be obtained by creating an account at their website. We first need to install Cleanlab’s Python client:

pip install --upgrade cleanlab-studio

Cleanlab supports several proprietary models such as ‘gpt-4o’, ‘gpt-4o-mini’, ‘o1-preview’, ‘claude-3-sonnet’, ‘claude-3.5-sonnet’, ‘claude-3.5-sonnet-v2’ and others. Here is how TLM assigns a trustworhiness score to gpt-4o’s answer. The trustworthiness score ranges from 0 to 1, where higher values indicate greater trustworthiness.

from cleanlab_studio import Studio
studio = Studio("")  # Get your API key from above
tlm = studio.TLM(options={"log": ["explanation"], "model": "gpt-4o"}) # GPT, Claude, etc
#set the prompt
out = tlm.prompt("How many vowels are there in the word 'Abracadabra'.?")
#the TLM response contains the actual output 'response', trustworthiness score and explanation
print(f"Model's response = {out['response']}")
print(f"Trustworthiness score = {out['trustworthiness_score']}")
print(f"Explanation = {out['log']['explanation']}")

The above code tested the response of gpt-4o for the question “How many vowels are there in the word ‘Abracadabra’.?”. The TLM’s output contains the model’s answer (response), trustworthiness score, and explanation. Here is the output of this code.

Model's response = The word "Abracadabra" contains 6 vowels. The vowels are: A, a, a, a, a, and a.
Trustworthiness score = 0.6842228802750124
Explanation = This response is untrustworthy due to a lack of consistency in possible responses from the model. Here's one inconsistent alternate response that the model considered (which may not be accurate either):
5.

It can be seen how the most advanced language model hallucinates for such simple tasks and produces the wrong output. Here is the response and trustworthiness score for the same question for claude-3.5-sonnet-v2.

Model's response = Let me count the vowels in 'Abracadabra':
A-b-r-a-c-a-d-a-b-r-a

The vowels are: A, a, a, a, a

There are 5 vowels in the word 'Abracadabra'.
Trustworthiness score = 0.9378276048845285
Explanation = Did not find a reason to doubt trustworthiness.

claude-3.5-sonnet-v2 produces the correct output. Let’s compare the two models’ responses to another question.

from cleanlab_studio import Studio
import markdown
from IPython.core.display import display, Markdown

# Initialize the Cleanlab Studio with API key
studio = Studio("")  # Replace with your actual API key

# List of models to evaluate
models = ["gpt-4o", "claude-3.5-sonnet-v2"]

# Define the prompt
prompt_text = "Which one of 9.11 and 9.9 is bigger?"

# Loop through each model and evaluate
for model in models:
   tlm = studio.TLM(options={"log": ["explanation"], "model": model})
   out = tlm.prompt(prompt_text)
  
   md_content = f"""
## Model: {model}

**Response:** {out['response']}

**Trustworthiness Score:** {out['trustworthiness_score']}

**Explanation:** {out['log']['explanation']}

---
"""
   display(Markdown(md_content))

Here is the response of the two models:

Wrong outputs generated by gpt-4o and claude-3.5-sonnet-v2, represented by low trustworthiness score

We can also generate a trustworthiness score for open-source LLMs. Let’s check the recent, much-hyped open-source LLM: deepseek-R1. I will use DeepSeek-R1-Distill-Llama-70B, based on Meta’s Llama-3.3–70B-Instruct model and distilled from DeepSeek’s larger 671-billion parameter Mixture of Experts (MoE) model. Knowledge distillation is a Machine Learning technique that aims to transfer the learnings of a large pre-trained model, the “teacher model,” to a smaller “student model.”

import streamlit as st
from langchain_groq.chat_models import ChatGroq
import os
os.environ["GROQ_API_KEY"]=st.secrets["GROQ_API_KEY"]
# Initialize the Groq Llama Instant model
groq_llm = ChatGroq(model="deepseek-r1-distill-llama-70b", temperature=0.5)
prompt = "Which one of 9.11 and 9.9 is bigger?"
# Get the response from the model
response = groq_llm.invoke(prompt)
#Initialize Cleanlab's studio
studio = Studio("226eeab91e944b23bd817a46dbe3c8ae") 
cleanlab_tlm = studio.TLM(options={"log": ["explanation"]})  #for explanations
#Get the output containing trustworthiness score and explanation
output = cleanlab_tlm.get_trustworthiness_score(prompt, response=response.content.strip())
md_content = f"""
## Model: {model}
**Response:** {response.content.strip()}
**Trustworthiness Score:** {output['trustworthiness_score']}
**Explanation:** {output['log']['explanation']}
---
"""
display(Markdown(md_content))

Here is the output of deepseek-r1-distill-llama-70b model.

The correct output of deepseek-r1-distill-llama-70b model with a high trustworthiness score

Developing a Trustworthy RAG

We will now develop an RAG to demonstrate how we can measure the trustworthiness of an LLM response in RAG. This RAG will be developed by scraping data from given links, parsing it in markdown format, and creating a vector store.

The following libraries need to be installed for the next code.

pip install llama-parse llama-index-core llama-index-embeddings-huggingface 
llama-index-llms-cleanlab requests beautifulsoup4 pdfkit nest-asyncio

To render HTML into PDF format, we also need to install wkhtmltopdf command line tool from their website.

The following libraries will be imported:

from llama_parse import LlamaParse
from llama_index.core import VectorStoreIndex
import requests
from bs4 import BeautifulSoup
import pdfkit
from llama_index.readers.docling import DoclingReader
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.cleanlab import CleanlabTLM
from typing import Dict, List, ClassVar
from llama_index.core.instrumentation.events import BaseEvent
from llama_index.core.instrumentation.event_handlers import BaseEventHandler
from llama_index.core.instrumentation import get_dispatcher
from llama_index.core.instrumentation.events.llm import LLMCompletionEndEvent
import nest_asyncio
import os

The next steps will involve scraping data from given URLs using Python’s BeautifulSoup library, saving the scraped data in PDF file(s) using pdfkit, and parsing the data from PDF(s) to markdown file using LlamaParse which is a genAI-native document parsing platform built with LLMs and for LLM use cases.

We will first configure the LLM to be used by CleanlabTLM and the embedding model (Huggingface embedding model BAAI/bge-small-en-v1.5) that will be used to compute the embeddings of the scraped data to create the vector store.

options = {
   "model": "gpt-4o",
   "max_tokens": 512,
   "log": ["explanation"]
}
llm = CleanlabTLM(api_key="", options=options)#Get your free API from https://cleanlab.ai/
Settings.llm = llm
Settings.embed_model = HuggingFaceEmbedding(
   model_name="BAAI/bge-small-en-v1.5"
)

We will now define a custom event handler, GetTrustworthinessScore, that is derived from a base event handler class. This handler gets triggered by the end of an LLM completion and extracts a trustworthiness score from the response metadata. A helper function, display_response, displays the LLM’s response along with its trustworthiness score.

# Event Handler for Trustworthiness Score
class GetTrustworthinessScore(BaseEventHandler):
   events: ClassVar[List[BaseEvent]] = []
   trustworthiness_score: float = 0.0
   @classmethod
   def class_name(cls) -> str:
       return "GetTrustworthinessScore"
   def handle(self, event: BaseEvent) -> Dict:
       if isinstance(event, LLMCompletionEndEvent):
           self.trustworthiness_score = event.response.additional_kwargs.get("trustworthiness_score", 0.0)
           self.events.append(event)
       return {}
# Helper function to display LLM's response
def display_response(response):
   response_str = response.response
   trustworthiness_score = event_handler.trustworthiness_score
   print(f"Response: {response_str}")
   print(f"Trustworthiness score: {round(trustworthiness_score, 2)}")

We will now generate PDFs by scraping data from given URLs. For demonstration, we will scrap data only from this Wikipedia article about large language models (Creative Commons Attribution-ShareAlike 4.0 License).

Note: Readers are advised to always double-check the status of the content/data they are about to scrape and ensure they are allowed to do so.

The following piece of code scrapes data from the given URLs by making an HTTP request and using BeautifulSoup Python library to parse the HTML content. HTML content is cleaned by converting protocol-relative URLs to absolute ones. Subsequently, the scraped content is converted into a PDF file(s) using pdfkit.

##########################################
# PDF Generation from Multiple URLs
##########################################
# Configure wkhtmltopdf path
wkhtml_path = r'C:Program Fileswkhtmltopdfbinwkhtmltopdf.exe'
config = pdfkit.configuration(wkhtmltopdf=wkhtml_path)
# Define URLs and assign document names
urls = {
   "LLMs": "https://en.wikipedia.org/wiki/Large_language_model"
}
# Directory to save PDFs
pdf_directory = "PDFs"
os.makedirs(pdf_directory, exist_ok=True)
pdf_paths = {}
for doc_name, url in urls.items():
   try:
       print(f"Processing {doc_name} from {url} ...")
       response = requests.get(url)
       soup = BeautifulSoup(response.text, "html.parser")
       main_content = soup.find("div", {"id": "mw-content-text"})
       if main_content is None:
           raise ValueError("Main content not found")
       # Replace protocol-relative URLs with absolute URLs
       html_string = str(main_content).replace('src="//', 'src="https://').replace('href="//', 'href="https://')
       pdf_file_path = os.path.join(pdf_directory, f"{doc_name}.pdf")
       pdfkit.from_string(
           html_string,
           pdf_file_path,
           options={'encoding': 'UTF-8', 'quiet': ''},
           configuration=config
       )
       pdf_paths[doc_name] = pdf_file_path
       print(f"Saved PDF for {doc_name} at {pdf_file_path}")
   except Exception as e:
       print(f"Error processing {doc_name}: {e}")

After generating PDF(s) from the scraped data, we parse these PDFs using LlamaParse. We set the parsing instructions to extract the content in markdown format and parse the document(s) page-wise along with the document name and page number. These extracted entities (pages) are referred to as nodes. The parser iterates over the extracted nodes and updates each node’s metadata by appending a citation header which facilitates later referencing.

##########################################
# Parse PDFs with LlamaParse and Inject Metadata
##########################################

# Define parsing instructions (if your parser supports it)
parsing_instructions = """Extract the document content in markdown.
Split the document into nodes (for example, by page).
Ensure each node has metadata for document name and page number."""
      
# Create a LlamaParse instance
parser = LlamaParse(
   api_key="",  #Replace with your actual key
   parsing_instructions=parsing_instructions,
   result_type="markdown",
   premium_mode=True,
   max_timeout=600
)
# Directory to save combined Markdown files (one per PDF)
output_md_dir = os.path.join(pdf_directory, "markdown_docs")
os.makedirs(output_md_dir, exist_ok=True)
# List to hold all updated nodes for indexing
all_nodes = []
for doc_name, pdf_path in pdf_paths.items():
   try:
       print(f"Parsing PDF for {doc_name} from {pdf_path} ...")
       nodes = parser.load_data(pdf_path)  # Returns a list of nodes
       updated_nodes = []
       # Process each node: update metadata and inject citation header into the text.
       for i, node in enumerate(nodes, start=1):
           # Copy existing metadata (if any) and add our own keys.
           new_metadata = dict(node.metadata) if node.metadata else {}
           new_metadata["document_name"] = doc_name
           if "page_number" not in new_metadata:
               new_metadata["page_number"] = str(i)
           # Build the citation header.
           citation_header = f"[{new_metadata['document_name']}, page {new_metadata['page_number']}]nn"
           # Prepend the citation header to the node's text.
           updated_text = citation_header + node.text
           new_node = node.__class__(text=updated_text, metadata=new_metadata)
           updated_nodes.append(new_node)
       # Save a single combined Markdown file for the document using the updated node texts.
       combined_texts = [node.text for node in updated_nodes]
       combined_md = "nn---nn".join(combined_texts)
       md_filename = f"{doc_name}.md"
       md_filepath = os.path.join(output_md_dir, md_filename)
       with open(md_filepath, "w", encoding="utf-8") as f:
           f.write(combined_md)
       print(f"Saved combined markdown for {doc_name} to {md_filepath}")
       # Add the updated nodes to the global list for indexing.
       all_nodes.extend(updated_nodes)
       print(f"Parsed {len(updated_nodes)} nodes from {doc_name}.")
   except Exception as e:
       print(f"Error parsing {doc_name}: {e}")

We now create a vector store and a query engine. We define a customer prompt template to guide the LLM’s behavior in answering the questions. Finally, we create a query engine with the created index to answer queries. For each query, we retrieve the top 3 nodes from the vector store based on their semantic similarity with the query. The LLM uses these retrieved nodes to generate the final answer.

##########################################
# Create Index and Query Engine
##########################################
# Create an index from all nodes.
index = VectorStoreIndex.from_documents(documents=all_nodes)
# Define a custom prompt template that forces the inclusion of citations.
prompt_template = """
You are an AI assistant with expertise in the subject matter.
Answer the question using ONLY the provided context.
Answer in well-formatted Markdown with bullets and sections wherever necessary.
If the provided context does not support an answer, respond with "I don't know."
Context:
{context_str}
Question:
{query_str}
Answer:
"""
# Create a query engine with the custom prompt.
query_engine = index.as_query_engine(similarity_top_k=3, llm=llm, prompt_template = prompt_template)
print("Combined index and query engine created successfully!")

Now let’s test the RAG for some queries and their corresponding trustworthiness scores.

query = "When is mixture of experts approach used?"
response = query_engine.query(query)
display_response(response)

Response to the query ‘When is mixture of experts approach used?’ (image by author)

query = "How do you compare Deepseek model with OpenAI's models?"
response = query_engine.query(query)
display_response(response)

Response to the query ‘How do you compare the Deepseek model with OpenAI’s models?’ (image by author)

Assigning a trustworthiness score to LLM’s response, whether generated through direct inference or RAG, helps to define the reliability of AI’s output and prioritize human verification where needed. This is particularly important for critical domains where a wrong or unreliable response could have severe consequences.

That’s all folks! If you like the article, please follow me on Medium and LinkedIn.

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Vår Energi lets 3-year contract for harsh-environment rig for NCS work

CERT-EU blames Trivy supply chain attack for Europa.eu data breach

Back door credentials The Trivy compromise dates to February, when TeamPCP exploited a misconfiguration in Trivy’s GitHub Actions environment, now identified as CVE-2026-33634, to establish a foothold via a privileged access token, according to Aqua Security. Discovering this, Aqua Security rotated credentials but, because some credentials remain valid during this

French government take Bull by horns for €404 million

It’s the second time that Bull has been nationalized: The first time, in 1982 was to save it from bankruptcy. Atos, has had financial troubles of its own. In August 2024, it tried — and failed — to sell its legacy infrastructure management business. The company had already staved off

Cisco fixes critical IMC auth bypass present in many products

Cisco has released patches for a critical vulnerability in its out-of-band management solution, present in many of its servers and appliances. The flaw allows unauthenticated remote attackers to gain admin access to the Cisco Integrated Management Controller (IMC), which gives administrators remote control over servers even when the main OS

Latin America returns to the energy security conversation at CERAWeek

With geopolitical risk central to conversations about energy, and with long-cycle supply once again in focus, Latin America’s mix of hydrocarbons and export potential drew renewed attention at CERAWeek by S&P Global in Houston. Argentina, resource story to export platform Among the regional stories, Argentina stood out as Vaca Muerta was no longer discussed simply as a large unconventional resource, but whether the country could turn resource quality into sustained export capacity. Country officials talked about scale: more operators, more services, more infrastructure, and a larger industrial base around the unconventional play. Daniel González, Vice Minister of Energy and Mining for Argentina, put it plainly: “The time has come to expand the Vaca Muerta ecosystem.” What is at stake now is not whether the basin works, but whether the country can build enough above-ground capacity and regulatory consistency to keep development moving. Horacio Marín, chairman and chief executive officer of YPF, offered an expansive version of that argument. He said Argentina’s energy exports could reach $50 billion/year by 2031, backed by roughly $130 billion in cumulative investment in oil, LNG, and transportation infrastructure. He said Argentine crude output could reach 1 million b/d by end-2026. He said Argentina wants to be seen less as a recurrent frontier story and more as a future supplier with scale. “The time to invest in Vaca Muerta is now,” Marín said. The LNG piece is starting to take shape. Eni, YPF, and XRG signed a joint development agreement in February to move Argentina LNG forward, with a first phase planned at 12 million tonnes/year. Southern Energy—backed by PAE, YPF, Pampa Energía, Harbour Energy, and Golar LNG—holds a long-term agreement with SEFE for 2 million tonnes/year over 8 years. The movement by global standards is early-stage and relatively modest, but it adds to Argentina’s export

Market Focus: LNG supply shocks expose limited market flexibility

@import url(‘https://fonts.googleapis.com/css2?family=Inter:[email protected]&display=swap’); a { color: var(–color-primary-main); } .ebm-page__main h1, .ebm-page__main h2, .ebm-page__main h3, .ebm-page__main h4, .ebm-page__main h5, .ebm-page__main h6 { font-family: Inter; } body { line-height: 150%; letter-spacing: 0.025em; font-family: Inter; } button, .ebm-button-wrapper { font-family: Inter; } .label-style { text-transform: uppercase; color: var(–color-grey); font-weight: 600; font-size: 0.75rem; } .caption-style { font-size: 0.75rem; opacity: .6; } #onetrust-pc-sdk [id*=btn-handler], #onetrust-pc-sdk [class*=btn-handler] { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-policy a, #onetrust-pc-sdk a, #ot-pc-content a { color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-pc-sdk .ot-active-menu { border-color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-accept-btn-handler, #onetrust-banner-sdk #onetrust-reject-all-handler, #onetrust-consent-sdk #onetrust-pc-btn-handler.cookie-setting-link { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-consent-sdk .onetrust-pc-btn-handler { color: #c19a06 !important; border-color: #c19a06 !important; } In this Market Focus episode of the Oil & Gas Journal ReEnterprised podcast, Conglin Xu, managing editor, economics, takes a look into the LNG market shock caused by the effective closure of the Strait of Hormuz and the sudden loss of Qatari LNG supply as the Iran war continues. Xu speaks with Edward O’Toole, director of global gas analysis, RBAC Inc., to examine how these disruptions are intensifying global supply constraints at a time when European inventories were already under pressure following a colder-than-average winter and weaker storage levels. Drawing on RBAC’s G2M2 global gas market model, O’Toole outlines disruption scenarios analyzed in the firm’s recent report and explains how current events align with their findings. With global LNG production already operating near maximum utilization, the market response is being driven by higher prices and reduced consumption. Europe faces sharper price pressure due to storage refill needs, while Asian markets are expected to see greater demand reductions as consumers switch fuels. O’Toole underscores the importance of scenario-based modeling and supply diversification as geopolitical risk exposes structural vulnerabilities in the LNG market—offering insights for stakeholders navigating an increasingly uncertain global

Libya’s NOC, Chevron sign MoU for technical study for offshore Block NC146

@import url(‘https://fonts.googleapis.com/css2?family=Inter:[email protected]&display=swap’); a { color: var(–color-primary-main); } .ebm-page__main h1, .ebm-page__main h2, .ebm-page__main h3, .ebm-page__main h4, .ebm-page__main h5, .ebm-page__main h6 { font-family: Inter; } body { line-height: 150%; letter-spacing: 0.025em; font-family: Inter; } button, .ebm-button-wrapper { font-family: Inter; } .label-style { text-transform: uppercase; color: var(–color-grey); font-weight: 600; font-size: 0.75rem; } .caption-style { font-size: 0.75rem; opacity: .6; } #onetrust-pc-sdk [id*=btn-handler], #onetrust-pc-sdk [class*=btn-handler] { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-policy a, #onetrust-pc-sdk a, #ot-pc-content a { color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-pc-sdk .ot-active-menu { border-color: #c19a06 !important; } #onetrust-consent-sdk #onetrust-accept-btn-handler, #onetrust-banner-sdk #onetrust-reject-all-handler, #onetrust-consent-sdk #onetrust-pc-btn-handler.cookie-setting-link { background-color: #c19a06 !important; border-color: #c19a06 !important; } #onetrust-consent-sdk .onetrust-pc-btn-handler { color: #c19a06 !important; border-color: #c19a06 !important; } The National Oil Corp. of Libya (NOC) signed a memorandum of understanding (MoU) with Chevron Corp. to conduct a comprehensive technical study of offshore Block NC146. The block is an unexplored area with “encouraging geological indicator that could lead to significant discoveries, helping to strengthen national reserves,” NOC noted Chairman Masoud Suleman as saying, noting that the partnership is “a message of confidence in the Libyan investment environment and evidence of the return of major companies to work and explore promising opportunities in our country.” According to the NOC, Libya produces 1.4 million b/d of oil and aims to increase oil production in the coming 3-5 years to 2 million b/d and then to 3 million b/d following years of instability that impacted the country’s production. Chevron is working to add to its diverse exploration and production portfolio in the Mediterranean and Africa and continues to assess potential future opportunities in the region. The operator earlier this year entered Libya after it was designated as a winning bidder for Contract Area 106 in the Sirte basin in the 2025 Libyan Bid Round. That followed the January 2026 signing of a

Finder Energy advances KTJ Project with development area approval

Finder Energy Holdings Ltd. received regulatory approval for a development area covering the Kuda Tasi and Jahal oil fields offshore Timor‑Leste, enabling progression toward field development. Autoridade Nacional do Petróleo (ANP) approved an 88‑sq km development area over the Kuda Tasi and Jahal oil fields (KTJ Project) within PSC 19‑11 offshore Timor‑Leste, representing the first stage of the regulatory approvals process for the project. The declaration of the development area is a precursor to the field development plan (FDP), which Finder is currently preparing for submission to ANP in second‑quarter 2026. Upon approval of the FDP, the development area would secure tenure for up to 25 years or until production ceases, allowing Finder to conduct development and production operations within the area, subject to applicable regulatory approvals and conditions. The company said its upside strategy centers on the potential for the Petrojarl I FPSO to serve as a central processing and export hub for future tiebacks of surrounding discoveries, contingent on successful appraisal and/or exploration activities within PSC 19‑11. Alternatively, longer tie‑back distances could be accommodated through a secondary standalone development in the southern portion of the PSC. Finder is continuing technical evaluation of appraisal and exploration opportunities to generate drilling targets. PSC 19‑11 lies within the Laminaria High oil province of Timor‑Leste. The KTJ Project contains an estimated 25 million bbl of gross 2C contingent resources, with identified upside of an additional 23 million bbl gross 2C contingent resources and 116 million bbl gross 2U prospective resources. Finder operates PSC 19‑11 with a 66% working interest.

Newly formed Polar LNG aims to develop nearshore LNG project on Alaska’s North Slope

Polar Train LNG LLC, a newly launched company aiming to build an LNG plant (Polar LNG) on Alaska’s North Slope, has appointed Joel Riddle as president and chief executive officer. “Alaska’s North Slope holds one of the most significant undeveloped natural gas resources in the world,” said Riddle, adding “Polar LNG is uniquely positioned to bring this resource online—delivering reliable energy for Alaska and a strategic supply for the United States… and provides trusted energy to our allies.” In a release Mar. 31, the company said it is advancing a nearshore project at Prudhoe Bay, citing “one of the shortest LNG shipping routes from North America to key Asian markets, approximately 3,600 miles to Japan compared to over 10,000 miles from the US Gulf Coast.” The company is aiming for first LNG from the 7-million tonnes/year plant—to be developed nearshore with modular infrastructure—in 2029-2030 at a cost of $8–9 billion. According to Polar LNG, natural gas would be sourced from existing infrastructure at Prudhoe Bay and transported via a short pipeline to a nearshore plant. There, a modular gravity-based structure would process and liquefy the gas. LNG would then be loaded onto specialized ice-class carriers for year-round export. The company is exploring potential repurposing of sanctioned equipment built for Russia’s Arctic LNG 2 project and is seeking permission from the US govenment to acquire parts impacted by the sanctions, according to reports. Before joining Polar LNG, Riddle served as managing director and chief executive officer of Tamboran Resources Ltd.

Asia bears brunt of energy shock as Middle East war disrupts liquid flows

Asia is facing a dual energy crisis marked by both soaring prices and physical supply disruptions as escalating war in the Middle East constrains flows through the Strait of Hormuz, according to a new report by Morningstar DBRS. The report highlights that roughly one-fifth of global crude oil and LNG supply has been affected by disruptions at the critical chokepoint, with Asia absorbing the majority of the impact due to its heavy dependence on imported hydrocarbons. About 83% of oil and LNG shipments passing through Hormuz are destined for Asian markets, amplifying the region’s exposure. Asia’s structural reliance on Middle Eastern energy imports has intensified the shock. Countries such as Japan and South Korea import nearly all of their energy needs, while China and India depend heavily on foreign supplies, much of it sourced from the Gulf. This dependence, combined with limited alternative shipping routes, has turned what initially appeared to be a price-driven shock into a broader supply and logistics crisis. Governments across the region have begun implementing emergency measures, including fuel rationing, price controls, and strategic reserve releases, to manage shortages and rising costs. Policy responses vary In North Asia, policymakers are leveraging stronger buffers. Japan has tapped strategic oil reserves and introduced subsidies to cushion consumers, while South Korea is relying on LNG stockpiles and fuel-switching capabilities. China has deployed administrative controls to stabilize domestic fuel prices and restrict refined product exports. By contrast, parts of South and Southeast Asia are more vulnerable. India has introduced tax relief and prioritized gas allocation, while countries such as the Philippines and Vietnam have declared energy emergencies and rolled out conservation measures. Several ASEAN (the Association of Southeast Asian Nations) economies have even implemented partial work-from-home policies to curb fuel consumption. Broader economic spillovers intensify Beyond energy markets, the disruption

Nscale Expands AI Factory Strategy With Power, Platform, and Scale

Nscale has moved quickly from startup to serious contender in the race to build infrastructure for the AI era. Founded in 2024, the company has positioned itself as a vertically integrated “neocloud” operator, combining data center development, GPU fleet ownership, and a software stack designed to deliver large-scale AI compute. That model has helped it attract backing from investors including Nvidia, and in early March 2026 the company raised another $2 billion at a reported $14.6 billion valuation. Reuters has described Nscale’s approach as owning and operating its own data centers, GPUs, and software stack to support major customers including Microsoft and OpenAI. What makes Nscale especially relevant now is that it is no longer content to operate as a cloud intermediary or capacity provider. Over the past year, the company has increasingly framed itself as an AI hyperscaler and AI factory builder, seeking to combine land, power, data center shells, GPU procurement, customer offtake, and software services into a single integrated platform. Its acquisition of American Intelligence & Power Corporation, or AIPCorp, is the clearest signal yet of that shift, bringing energy infrastructure directly into the center of Nscale’s business model. The AIPCorp transaction is significant because it gives Nscale more than additional development capacity. The company said the deal includes the Monarch Compute Campus in Mason County, West Virginia, a site of up to 2,250 acres with a state-certified AI microgrid and a power runway it says can scale beyond 8 gigawatts. Nscale also said the acquisition establishes a new division, Nscale Energy & Power, headquartered in Houston, extending its platform further into power development. That positioning reflects a broader shift in the AI infrastructure market. The central bottleneck is no longer simply access to GPUs. It is the ability to assemble power, cooling, land, permits, data center

Google Research touts memory-compression breakthrough for AI processing

The last time the market witnessed a shakeup like this was China’s DeepSeek, but doubts emerged quickly about its efficacy. Developers found DeepSeek’s efficiency gains required deep architectural decisions that had to be built in from the start. TurboQuant requires no retraining or fine-tuning. You just drop it straight into existing inference pipelines, at least in theory. If it works in production systems with no retrofitting, then data center operators will get tremendous performance gains on existing hardware. Data center operators won’t have to throw hardware at the performance problem. However, analysts urge caution before jumping to conclusions. “This is a research breakthrough, not a shipping product,” said Alex Cordovil, research director for physical infrastructure at The Dell’Oro Group. “There’s often a meaningful gap between a published paper and real-world inference workloads.” Also, Dell’Oro notes that efficiency gains in AI compute tend to get consumed by more demand, known as the Jevons paradox. “Any freed-up capacity would likely be absorbed by frontier models expanding their capabilities rather than reducing their hardware footprint.” Jim Handy, president of Objective Analysis, agrees on that second part. “Hyperscalers won’t cut their spending – they’ll just spend the same amount and get more bang for their buck,” he said. “Data centers aren’t looking to reach a certain performance level and subsequently stop spending on AI. They’re looking to out-spend each other to gain market dominance. This won’t change that.” Google plans to present a paper outlining TurboQuant at the ICLR conference in Rio de Janeiro running from April 23 through April 27.

Amazon Middle East datacenter suffers second drone hit as Iran steps up attacks

Amazon was contacted for comment on the latest Bahrain drone incident, but said it had nothing to add beyond the statement in its current advisory. Denial of infrastructure Doing the damage is the Shaheed 136, a small and unsophisticated drone designed to overwhelm defenders with numbers. If only one in twenty reaches its target, the price-performance still exceeds that of more expensive systems. When aimed at critical infrastructure such as datacenters, the effect is also psychological; the threat of an attack on its own can be enough to make it difficult for organizations to continue using an at-risk facility. Iran’s targeting of the Bahrain datacenter is unlikely to be random. Amazon opened its ME-SOUTH-1 AWS presence in 2019, and it is still believed to be the company’s largest site in the Middle East. Earlier this week, the Islamic Revolutionary Guard Corps (IRGC) Telegram channel explicitly threatened to target at least 18 US companies operating in the region, including Microsoft, Google, Nvidia, and Apple. This follows similar threats to an even longer list of US companies made on the IRGC-affiliated Tasnim News Agency in recent weeks. That strategy doesn’t bode well for US companies that have made large investments in Middle Eastern datacenter infrastructure in recent years, drawn by the growing wealth and influence of countries in the region. This includes Amazon, which has announced plans to build a $5.3 billion datacenter in Saudi Arabia, due to become available in 2026. If this is now under threat, whether by warfare or the hypothetical possibility of attack, that will create uncertainty.

Data Center Jobs: Engineering, Construction, Commissioning, Sales, Field Service and Facility Tech Jobs Available in Major Data Center Hotspots

Each month Data Center Frontier, in partnership with Pkaza, posts some of the hottest data center career opportunities in the market. Here’s a look at some of the latest data center jobs posted on the Data Center Frontier jobs board, powered by Pkaza Critical Facilities Recruiting. Looking for Data Center Candidates? Check out Pkaza’s Active Candidate / Featured Candidate Hotlist Power Applications Engineer Pittsburgh, PA This position is also available in: Denver, CO and Andrews, SC. Our client is a leading provider and manufacturer of industrial electrical power equipment used in industrial applications for mission critical operations. They help their customers save money by reducing energy and operating costs and provide solutions for modernizing their customer’s existing electrical infrastructure. This company provides cooling solutions to many of the world’s largest organizations and government facilities and enterprise clients, colocation providers and hyperscale companies. This career-growth minded opportunity offers exciting projects with leading-edge technology and innovation as well as competitive salaries and benefits. Electrical Commissioning Engineer Ashburn, VA This traveling position is also available in: New York, NY; White Plains, NY; Dallas, TX; Richmond, VA; Montvale, NJ; Charlotte, NC; Atlanta, GA; Hampton, GA; New Albany, OH; Cedar Rapids, IA; Phoenix, AZ; Salt Lake City, UT; Kansas City, MO; Omaha, NE; Chesterton, IN or Chicago, IL. *** ALSO looking for a LEAD EE and ME CxA Agents and CxA PMs. *** Our client is an engineering design and commissioning company that has a national footprint and specializes in MEP critical facilities design. They provide design, commissioning, consulting and management expertise in the critical facilities space. They have a mindset to provide reliability, energy efficiency, sustainable design and LEED expertise when providing these consulting services for enterprise, colocation and hyperscale companies. This career-growth minded opportunity offers exciting projects with leading-edge technology and innovation as well as competitive

No joke: data centers are warming the planet

The researchers also made use of a database provided by the International Energy Agency (IEA) that the authors pointed out contains more than 11,000 locations worldwide, of which 8,472 have been detected to dwell outside of highly dense urban areas. The latter locations were then used to “quantify the effect of data centers on the environment in terms of the LST gradient that could be measured on the areas surrounding each data center.” Asking the wrong question Asked if AI data centers are really causing local warming, or if this phenomenon is overstated, Sanchit Vir Gogia, chief analyst at Greyhound Research, said, “the signal is real, but the industry is asking the wrong question. The research shows a consistent rise in land surface temperature of around 2°C following the establishment of large data centre facilities.” The debate, however, “has quickly shifted to causality: whether this is driven by operational heat from compute, or by land transformation during construction. That distinction matters scientifically, but it does not change the strategic implication.” Land surface temperature, said Gogia, is not the same as air temperature, and that gap will be used to challenge the findings. “But dismissing the signal on that basis would be a mistake,” he noted. “Data centers concentrate energy use, replace natural surfaces with heat-retaining materials, and continuously reject heat into the environment. Those are known drivers of thermal change.” He added, “the uncomfortable truth is this: Even if the exact mechanism is debated, the outcome aligns with first principles. Infrastructure at this scale alters its surroundings. The industry does not yet have a clean way to separate construction impact from operational impact, and that ambiguity makes the risk harder to model, not easier. This is not overstated, it is under-interpreted.” Location strategy must change But will the findings change

Schneider Electric Maps the AI Data Center’s Next Design Era

The coming shift to higher-voltage DC That internal power challenge led Simonelli to one of the most consequential architectural topics in the interview: the likely transition toward higher-voltage DC distribution at very high rack densities. He framed it pragmatically. At current density levels, the industry knows how to get power into racks at 200 or 300 kilowatts. But as densities rise toward 400 kilowatts and beyond, conventional AC approaches start to run into physical limits. Too much cable, too much copper, too much conversion equipment, and too much space consumed by power infrastructure rather than GPUs. At that point, he said, higher-voltage DC becomes attractive not for philosophical reasons, but because it reduces current, shrinks conductor size, saves space, and leaves more room for revenue-generating compute. “It is again a paradigm shift,” Simonelli said of DC power at these densities. “But it won’t be everywhere.” That is probably right. The transition will not be universal, and the exact thresholds will evolve. But his underlying point is powerful. As rack densities climb, electrical architecture starts to matter not only for efficiency and reliability, but for physical space allocation inside the rack. Put differently, power distribution becomes a compute-enablement issue. Distance between accelerators matters, too. The closer GPUs and TPUs can be kept together, the better they perform. If power infrastructure can be compacted, more of the rack can be devoted to dense compute, improving the economics and performance of the system. That is a strong example of how AI is collapsing traditional boundaries between facility engineering and compute architecture. The two are no longer cleanly separable. Gas now, renewables over time On onsite power, Simonelli was refreshingly direct. If the goal is dispatchable onsite generation at the scale now being contemplated for AI facilities, he said, “there really isn’t an alternative

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Stay Ahead, Stay ONMINE