Stay Ahead, Stay ONMINE

R.E.D.: Scaling Text Classification with Expert Delegation

With the new age of problem-solving augmented by Large Language Models (LLMs), only a handful of problems remain that have subpar solutions. Most classification problems (at a PoC level) can be solved by leveraging LLMs at 70–90% Precision/F1 with just good prompt engineering techniques, as well as adaptive in-context-learning (ICL) examples. What happens when you want to consistently achieve performance higher than that — when prompt engineering no longer suffices? The classification conundrum Text classification is one of the oldest and most well-understood examples of supervised learning. Given this premise, it should really not be hard to build robust, well-performing classifiers that handle a large number of input classes, right…? Welp. It is. It actually has to do a lot more with the ‘constraints’ that the algorithm is generally expected to work under: low amount of training data per class high classification accuracy (that plummets as you add more classes) possible addition of new classes to an existing subset of classes quick training/inference cost-effectiveness (potentially) really large number of training classes (potentially) endless required retraining of some classes due to data drift, etc. Ever tried building a classifier beyond a few dozen classes under these conditions? (I mean, even GPT could probably do a great job up to ~30 text classes with just a few samples…) Considering you take the GPT route — If you have more than a couple dozen classes or a sizeable amount of data to be classified, you are gonna have to reach deep into your pockets with the system prompt, user prompt, few shot example tokens that you will need to classify one sample. That is after making peace with the throughput of the API, even if you are running async queries. In applied ML, problems like these are generally tricky to solve since they don’t fully satisfy the requirements of supervised learning or aren’t cheap/fast enough to be run via an LLM. This particular pain point is what the R.E.D algorithm addresses: semi-supervised learning, when the training data per class is not enough to build (quasi)traditional classifiers. The R.E.D. algorithm R.E.D: Recursive Expert Delegation is a novel framework that changes how we approach text classification. This is an applied ML paradigm — i.e., there is no fundamentally different architecture to what exists, but its a highlight reel of ideas that work best to build something that is practical and scalable. In this post, we will be working through a specific example where we have a large number of text classes (100–1000), each class only has few samples (30–100), and there are a non-trivial number of samples to classify (10,000–100,000). We approach this as a semi-supervised learning problem via R.E.D. Let’s dive in. How it works simple representation of what R.E.D. does Instead of having a single classifier classify between a large number of classes, R.E.D. intelligently: Divides and conquers — Break the label space (large number of input labels) into multiple subsets of labels. This is a greedy label subset formation approach. Learns efficiently — Trains specialized classifiers for each subset. This step focuses on building a classifier that oversamples on noise, where noise is intelligently modeled as data from other subsets. Delegates to an expert — Employes LLMs as expert oracles for specific label validation and correction only, similar to having a team of domain experts. Using an LLM as a proxy, it empirically ‘mimics’ how a human expert validates an output. Recursive retraining — Continuously retrains with fresh samples added back from the expert until there are no more samples to be added/a saturation from information gain is achieved The intuition behind it is not very hard to grasp: Active Learning employs humans as domain experts to consistently ‘correct’ or ‘validate’ the outputs from an ML model, with continuous training. This stops when the model achieves acceptable performance. We intuit and rebrand the same, with a few clever innovations that will be detailed in a research pre-print later. Let’s take a deeper look… Greedy subset selection with least similar elements When the number of input labels (classes) is high, the complexity of learning a linear decision boundary between classes increases. As such, the quality of the classifier deteriorates as the number of classes increases. This is especially true when the classifier does not have enough samples to learn from — i.e. each of the training classes has only a few samples. This is very reflective of a real-world scenario, and the primary motivation behind the creation of R.E.D. Some ways of improving a classifier’s performance under these constraints: Restrict the number of classes a classifier needs to classify between Make the decision boundary between classes clearer, i.e., train the classifier on highly dissimilar classes Greedy Subset Selection does exactly this — since the scope of the problem is Text Classification, we form embeddings of the training labels, reduce their dimensionality via UMAP, then form S subsets from them. Each of the S subsets has elements as n training labels. We pick training labels greedily, ensuring that every label we pick for the subset is the most dissimilar label w.r.t. the other labels that exist in the subset: import numpy as np from sklearn.metrics.pairwise import cosine_similarity def avg_embedding(candidate_embeddings): return np.mean(candidate_embeddings, axis=0) def get_least_similar_embedding(target_embedding, candidate_embeddings): similarities = cosine_similarity(target_embedding, candidate_embeddings) least_similar_index = np.argmin(similarities) # Use argmin to find the index of the minimum least_similar_element = candidate_embeddings[least_similar_index] return least_similar_element def get_embedding_class(embedding, embedding_map): reverse_embedding_map = {value: key for key, value in embedding_map.items()} return reverse_embedding_map.get(embedding) # Use .get() to handle missing keys gracefully def select_subsets(embeddings, n): visited = {cls: False for cls in embeddings.keys()} subsets = [] current_subset = [] while any(not visited[cls] for cls in visited): for cls, average_embedding in embeddings.items(): if not current_subset: current_subset.append(average_embedding) visited[cls] = True elif len(current_subset) >= n: subsets.append(current_subset.copy()) current_subset = [] else: subset_average = avg_embedding(current_subset) remaining_embeddings = [emb for cls_, emb in embeddings.items() if not visited[cls_]] if not remaining_embeddings: break # handle edge case least_similar = get_least_similar_embedding(target_embedding=subset_average, candidate_embeddings=remaining_embeddings) visited_class = get_embedding_class(least_similar, embeddings) if visited_class is not None: visited[visited_class] = True current_subset.append(least_similar) if current_subset: # Add any remaining elements in current_subset subsets.append(current_subset) return subsets the result of this greedy subset sampling is all the training labels clearly boxed into subsets, where each subset has at most only n classes. This inherently makes the job of a classifier easier, compared to the original S classes it would have to classify between otherwise! Semi-supervised classification with noise oversampling Cascade this after the initial label subset formation — i.e., this classifier is only classifying between a given subset of classes. Picture this: when you have low amounts of training data, you absolutely cannot create a hold-out set that is meaningful for evaluation. Should you do it at all? How do you know if your classifier is working well? We approached this problem slightly differently — we defined the fundamental job of a semi-supervised classifier to be pre-emptive classification of a sample. This means that regardless of what a sample gets classified as it will be ‘verified’ and ‘corrected’ at a later stage: this classifier only needs to identify what needs to be verified. As such, we created a design for how it would treat its data: n+1 classes, where the last class is noise noise: data from classes that are NOT in the current classifier’s purview. The noise class is oversampled to be 2x the average size of the data for the classifier’s labels Oversampling on noise is a faux-safety measure, to ensure that adjacent data that belongs to another class is most likely predicted as noise instead of slipping through for verification. How do you check if this classifier is working well — in our experiments, we define this as the number of ‘uncertain’ samples in a classifier’s prediction. Using uncertainty sampling and information gain principles, we were effectively able to gauge if a classifier is ‘learning’ or not, which acts as a pointer towards classification performance. This classifier is consistently retrained unless there is an inflection point in the number of uncertain samples predicted, or there is only a delta of information being added iteratively by new samples. Proxy active learning via an LLM agent This is the heart of the approach — using an LLM as a proxy for a human validator. The human validator approach we are talking about is Active Labelling Let’s get an intuitive understanding of Active Labelling: Use an ML model to learn on a sample input dataset, predict on a large set of datapoints For the predictions given on the datapoints, a subject-matter expert (SME) evaluates ‘validity’ of predictions Recursively, new ‘corrected’ samples are added as training data to the ML model The ML model consistently learns/retrains, and makes predictions until the SME is satisfied by the quality of predictions For Active Labelling to work, there are expectations involved for an SME: when we expect a human expert to ‘validate’ an output sample, the expert understands what the task is a human expert will use judgement to evaluate ‘what else’ definitely belongs to a label L when deciding if a new sample should belong to L Given these expectations and intuitions, we can ‘mimic’ these using an LLM: give the LLM an ‘understanding’ of what each label means. This can be done by using a larger model to critically evaluate the relationship between {label: data mapped to label} for all labels. In our experiments, this was done using a 32B variant of DeepSeek that was self-hosted. Giving an LLM the capability to understand ‘why, what, and how’ Instead of predicting what is the correct label, leverage the LLM to identify if a prediction is ‘valid’ or ‘invalid’ only (i.e., LLM only has to answer a binary query). Reinforce the idea of what other valid samples for the label look like, i.e., for every pre-emptively predicted label for a sample, dynamically source c closest samples in its training (guaranteed valid) set when prompting for validation. The result? A cost-effective framework that relies on a fast, cheap classifier to make pre-emptive classifications, and an LLM that verifies these using (meaning of the label + dynamically sourced training samples that are similar to the current classification): import math def calculate_uncertainty(clf, sample): predicted_probabilities = clf.predict_proba(sample.reshape(1, -1))[0] # Reshape sample for predict_proba uncertainty = -sum(p * math.log(p, 2) for p in predicted_probabilities) return uncertainty def select_informative_samples(clf, data, k): informative_samples = [] uncertainties = [calculate_uncertainty(clf, sample) for sample in data] # Sort data by descending order of uncertainty sorted_data = sorted(zip(data, uncertainties), key=lambda x: x[1], reverse=True) # Get top k samples with highest uncertainty for sample, uncertainty in sorted_data[:k]: informative_samples.append(sample) return informative_samples def proxy_label(clf, llm_judge, k, testing_data): #llm_judge – any LLM with a system prompt tuned for verifying if a sample belongs to a class. Expected output is a bool : True or False. True verifies the original classification, False refutes it predicted_classes = clf.predict(testing_data) # Select k most informative samples using uncertainty sampling informative_samples = select_informative_samples(clf, testing_data, k) # List to store correct samples voted_data = [] # Evaluate informative samples with the LLM judge for sample in informative_samples: sample_index = testing_data.tolist().index(sample.tolist()) # changed from testing_data.index(sample) because of numpy array type issue predicted_class = predicted_classes[sample_index] # Check if LLM judge agrees with the prediction if llm_judge(sample, predicted_class): # If correct, add the sample to voted data voted_data.append(sample) # Return the list of correct samples with proxy labels return voted_data By feeding the valid samples (voted_data) to our classifier under controlled parameters, we achieve the ‘recursive’ part of our algorithm: Recursive Expert Delegation: R.E.D. By doing this, we were able to achieve close-to-human-expert validation numbers on controlled multi-class datasets. Experimentally, R.E.D. scales up to 1,000 classes while maintaining a competent degree of accuracy almost on par with human experts (90%+ agreement). I believe this is a significant achievement in applied ML, and has real-world uses for production-grade expectations of cost, speed, scale, and adaptability. The technical report, publishing later this year, highlights relevant code samples as well as experimental setups used to achieve given results. All images, unless otherwise noted, are by the author Interested in more details? Reach out to me over Medium or email for a chat!

With the new age of problem-solving augmented by Large Language Models (LLMs), only a handful of problems remain that have subpar solutions. Most classification problems (at a PoC level) can be solved by leveraging LLMs at 70–90% Precision/F1 with just good prompt engineering techniques, as well as adaptive in-context-learning (ICL) examples.

What happens when you want to consistently achieve performance higher than that — when prompt engineering no longer suffices?

The classification conundrum

Text classification is one of the oldest and most well-understood examples of supervised learning. Given this premise, it should really not be hard to build robust, well-performing classifiers that handle a large number of input classes, right…?

Welp. It is.

It actually has to do a lot more with the ‘constraints’ that the algorithm is generally expected to work under:

  • low amount of training data per class
  • high classification accuracy (that plummets as you add more classes)
  • possible addition of new classes to an existing subset of classes
  • quick training/inference
  • cost-effectiveness
  • (potentially) really large number of training classes
  • (potentially) endless required retraining of some classes due to data drift, etc.

Ever tried building a classifier beyond a few dozen classes under these conditions? (I mean, even GPT could probably do a great job up to ~30 text classes with just a few samples…)

Considering you take the GPT route — If you have more than a couple dozen classes or a sizeable amount of data to be classified, you are gonna have to reach deep into your pockets with the system prompt, user prompt, few shot example tokens that you will need to classify one sample. That is after making peace with the throughput of the API, even if you are running async queries.

In applied ML, problems like these are generally tricky to solve since they don’t fully satisfy the requirements of supervised learning or aren’t cheap/fast enough to be run via an LLM. This particular pain point is what the R.E.D algorithm addresses: semi-supervised learning, when the training data per class is not enough to build (quasi)traditional classifiers.

The R.E.D. algorithm

R.E.D: Recursive Expert Delegation is a novel framework that changes how we approach text classification. This is an applied ML paradigm — i.e., there is no fundamentally different architecture to what exists, but its a highlight reel of ideas that work best to build something that is practical and scalable.

In this post, we will be working through a specific example where we have a large number of text classes (100–1000), each class only has few samples (30–100), and there are a non-trivial number of samples to classify (10,000–100,000). We approach this as a semi-supervised learning problem via R.E.D.

Let’s dive in.

How it works

simple representation of what R.E.D. does

Instead of having a single classifier classify between a large number of classes, R.E.D. intelligently:

  1. Divides and conquers — Break the label space (large number of input labels) into multiple subsets of labels. This is a greedy label subset formation approach.
  2. Learns efficiently — Trains specialized classifiers for each subset. This step focuses on building a classifier that oversamples on noise, where noise is intelligently modeled as data from other subsets.
  3. Delegates to an expert — Employes LLMs as expert oracles for specific label validation and correction only, similar to having a team of domain experts. Using an LLM as a proxy, it empirically ‘mimics’ how a human expert validates an output.
  4. Recursive retraining — Continuously retrains with fresh samples added back from the expert until there are no more samples to be added/a saturation from information gain is achieved

The intuition behind it is not very hard to grasp: Active Learning employs humans as domain experts to consistently ‘correct’ or ‘validate’ the outputs from an ML model, with continuous training. This stops when the model achieves acceptable performance. We intuit and rebrand the same, with a few clever innovations that will be detailed in a research pre-print later.

Let’s take a deeper look…

Greedy subset selection with least similar elements

When the number of input labels (classes) is high, the complexity of learning a linear decision boundary between classes increases. As such, the quality of the classifier deteriorates as the number of classes increases. This is especially true when the classifier does not have enough samples to learn from — i.e. each of the training classes has only a few samples.

This is very reflective of a real-world scenario, and the primary motivation behind the creation of R.E.D.

Some ways of improving a classifier’s performance under these constraints:

  • Restrict the number of classes a classifier needs to classify between
  • Make the decision boundary between classes clearer, i.e., train the classifier on highly dissimilar classes

Greedy Subset Selection does exactly this — since the scope of the problem is Text Classification, we form embeddings of the training labels, reduce their dimensionality via UMAP, then form S subsets from them. Each of the subsets has elements as training labels. We pick training labels greedily, ensuring that every label we pick for the subset is the most dissimilar label w.r.t. the other labels that exist in the subset:

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity


def avg_embedding(candidate_embeddings):
    return np.mean(candidate_embeddings, axis=0)

def get_least_similar_embedding(target_embedding, candidate_embeddings):
    similarities = cosine_similarity(target_embedding, candidate_embeddings)
    least_similar_index = np.argmin(similarities)  # Use argmin to find the index of the minimum
    least_similar_element = candidate_embeddings[least_similar_index]
    return least_similar_element


def get_embedding_class(embedding, embedding_map):
    reverse_embedding_map = {value: key for key, value in embedding_map.items()}
    return reverse_embedding_map.get(embedding)  # Use .get() to handle missing keys gracefully


def select_subsets(embeddings, n):
    visited = {cls: False for cls in embeddings.keys()}
    subsets = []
    current_subset = []

    while any(not visited[cls] for cls in visited):
        for cls, average_embedding in embeddings.items():
            if not current_subset:
                current_subset.append(average_embedding)
                visited[cls] = True
            elif len(current_subset) >= n:
                subsets.append(current_subset.copy())
                current_subset = []
            else:
                subset_average = avg_embedding(current_subset)
                remaining_embeddings = [emb for cls_, emb in embeddings.items() if not visited[cls_]]
                if not remaining_embeddings:
                    break # handle edge case
                
                least_similar = get_least_similar_embedding(target_embedding=subset_average, candidate_embeddings=remaining_embeddings)

                visited_class = get_embedding_class(least_similar, embeddings)

                
                if visited_class is not None:
                  visited[visited_class] = True


                current_subset.append(least_similar)
    
    if current_subset:  # Add any remaining elements in current_subset
        subsets.append(current_subset)
        

    return subsets

the result of this greedy subset sampling is all the training labels clearly boxed into subsets, where each subset has at most only classes. This inherently makes the job of a classifier easier, compared to the original classes it would have to classify between otherwise!

Semi-supervised classification with noise oversampling

Cascade this after the initial label subset formation — i.e., this classifier is only classifying between a given subset of classes.

Picture this: when you have low amounts of training data, you absolutely cannot create a hold-out set that is meaningful for evaluation. Should you do it at all? How do you know if your classifier is working well?

We approached this problem slightly differently — we defined the fundamental job of a semi-supervised classifier to be pre-emptive classification of a sample. This means that regardless of what a sample gets classified as it will be ‘verified’ and ‘corrected’ at a later stage: this classifier only needs to identify what needs to be verified.

As such, we created a design for how it would treat its data:

  • n+1 classes, where the last class is noise
  • noise: data from classes that are NOT in the current classifier’s purview. The noise class is oversampled to be 2x the average size of the data for the classifier’s labels

Oversampling on noise is a faux-safety measure, to ensure that adjacent data that belongs to another class is most likely predicted as noise instead of slipping through for verification.

How do you check if this classifier is working well — in our experiments, we define this as the number of ‘uncertain’ samples in a classifier’s prediction. Using uncertainty sampling and information gain principles, we were effectively able to gauge if a classifier is ‘learning’ or not, which acts as a pointer towards classification performance. This classifier is consistently retrained unless there is an inflection point in the number of uncertain samples predicted, or there is only a delta of information being added iteratively by new samples.

Proxy active learning via an LLM agent

This is the heart of the approach — using an LLM as a proxy for a human validator. The human validator approach we are talking about is Active Labelling

Let’s get an intuitive understanding of Active Labelling:

  • Use an ML model to learn on a sample input dataset, predict on a large set of datapoints
  • For the predictions given on the datapoints, a subject-matter expert (SME) evaluates ‘validity’ of predictions
  • Recursively, new ‘corrected’ samples are added as training data to the ML model
  • The ML model consistently learns/retrains, and makes predictions until the SME is satisfied by the quality of predictions

For Active Labelling to work, there are expectations involved for an SME:

  • when we expect a human expert to ‘validate’ an output sample, the expert understands what the task is
  • a human expert will use judgement to evaluate ‘what else’ definitely belongs to a label L when deciding if a new sample should belong to L

Given these expectations and intuitions, we can ‘mimic’ these using an LLM:

  • give the LLM an ‘understanding’ of what each label means. This can be done by using a larger model to critically evaluate the relationship between {label: data mapped to label} for all labels. In our experiments, this was done using a 32B variant of DeepSeek that was self-hosted.
Giving an LLM the capability to understand ‘why, what, and how’
  • Instead of predicting what is the correct label, leverage the LLM to identify if a prediction is ‘valid’ or ‘invalid’ only (i.e., LLM only has to answer a binary query).
  • Reinforce the idea of what other valid samples for the label look like, i.e., for every pre-emptively predicted label for a sample, dynamically source c closest samples in its training (guaranteed valid) set when prompting for validation.

The result? A cost-effective framework that relies on a fast, cheap classifier to make pre-emptive classifications, and an LLM that verifies these using (meaning of the label + dynamically sourced training samples that are similar to the current classification):

import math

def calculate_uncertainty(clf, sample):
    predicted_probabilities = clf.predict_proba(sample.reshape(1, -1))[0]  # Reshape sample for predict_proba
    uncertainty = -sum(p * math.log(p, 2) for p in predicted_probabilities)
    return uncertainty


def select_informative_samples(clf, data, k):
    informative_samples = []
    uncertainties = [calculate_uncertainty(clf, sample) for sample in data]

    # Sort data by descending order of uncertainty
    sorted_data = sorted(zip(data, uncertainties), key=lambda x: x[1], reverse=True)

    # Get top k samples with highest uncertainty
    for sample, uncertainty in sorted_data[:k]:
        informative_samples.append(sample)

    return informative_samples


def proxy_label(clf, llm_judge, k, testing_data):
    #llm_judge - any LLM with a system prompt tuned for verifying if a sample belongs to a class. Expected output is a bool : True or False. True verifies the original classification, False refutes it
    predicted_classes = clf.predict(testing_data)

    # Select k most informative samples using uncertainty sampling
    informative_samples = select_informative_samples(clf, testing_data, k)

    # List to store correct samples
    voted_data = []

    # Evaluate informative samples with the LLM judge
    for sample in informative_samples:
        sample_index = testing_data.tolist().index(sample.tolist()) # changed from testing_data.index(sample) because of numpy array type issue
        predicted_class = predicted_classes[sample_index]

        # Check if LLM judge agrees with the prediction
        if llm_judge(sample, predicted_class):
            # If correct, add the sample to voted data
            voted_data.append(sample)

    # Return the list of correct samples with proxy labels
    return voted_data

By feeding the valid samples (voted_data) to our classifier under controlled parameters, we achieve the ‘recursive’ part of our algorithm:

Recursive Expert Delegation: R.E.D.

By doing this, we were able to achieve close-to-human-expert validation numbers on controlled multi-class datasets. Experimentally, R.E.D. scales up to 1,000 classes while maintaining a competent degree of accuracy almost on par with human experts (90%+ agreement).

I believe this is a significant achievement in applied ML, and has real-world uses for production-grade expectations of cost, speed, scale, and adaptability. The technical report, publishing later this year, highlights relevant code samples as well as experimental setups used to achieve given results.

All images, unless otherwise noted, are by the author

Interested in more details? Reach out to me over Medium or email for a chat!

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Fluent Bit vulnerabilities could enable full cloud takeover

Attackers could flood monitoring systems with false or misleading events, hide alerts in the noise, or even hijack the telemetry stream entirely, Katz said. The issue is now tracked as CVE-2025-12969 and awaits a severity valuation. Almost equally troubling are other flaws in the “tag” mechanism, which determines how the records are

Read More »

Carney Loses Quebec Minister After Energy Deal

Prime Minister Mark Carney lost a cabinet minister over his oil pipeline agreement with Alberta, marking the first major fracture in his Liberal Party caucus over the government’s energy policies. Steven Guilbeault, a former environmental activist with Greenpeace, resigned his position as culture minister but will remain in Parliament. In a statement posted on social media Thursday evening, Guilbeault panned the agreement reached with Alberta, saying the government failed to consult Indigenous communities and the pipeline would pose significant environmental risks. “When I entered politics, it was because I had a deep conviction that I could make a difference in fighting climate change,” he wrote. “My commitment to leaving a better world for the future of our children and our planet remains unchanged.” Guilbeault was first elected to office in 2019, and served as both heritage minister and environment minister under former Prime Minister Justin Trudeau. He also has a high profile in the province of Quebec, a crucial electoral battleground that helped Carney win government in this year’s election. In his letter, Guilbeault noted that the government has dismantled much of his legacy as environment minister — repealing consumer carbon pricing, delaying a policy to encourage zero-emission vehicles and scrapping the oil and gas sector emissions cap. Carney thanked his former minister in a statement late Thursday. “While we may have differing views at times on how exactly we make essential progress, I am glad Steven will continue to offer his important perspectives as a Member of Parliament in our Liberal caucus,” the prime minister said.  Quebec Voice Earlier in the day, Carney had unveiled an agreement with Danielle Smith, premier of the oil-rich province of Alberta. The document pledges federal government support for a new oil pipeline to Canada’s west coast and scraps some Trudeau-era environmental regulations in exchange for Alberta’s agreement to hike

Read More »

Two Oil Tankers Suffer Mystery Blasts While in Black Sea

(Update) November 28, 2025, 6:14 PM GMT: Article updated. Two ocean-going tankers that are heavily sanctioned for carrying Russian oil suffered near-simultaneous blasts off Turkey’s Black Sea coast. The first, the 900-foot Kairos, was taking on water after an explosion, according to a local port agent report. Turkey’s Directorate General for Maritime Affairs confirmed the incident and said a second ship, the Virat, had also been struck near its coastline and was billowing smoke. The causes are unclear and a rescue operation for both ships was underway.  The pair are two of hundreds of vessels that were amassed to help keep Russia’s oil moving after it invaded Ukraine. Kairos is sanctioned by the UK and European Union, while Virat was designated by the US and EU.  DENİZCİLİK GENEL MÜDÜRLÜĞÜ@denizcilikgm  VIRAT isimli tanker, Karadeniz’de takribi 35 deniz mili açıkta isabet aldığını bildirmiş, olay yerine kurtarma unsurları ve ticari gemi yönlendirilmiştir. Gemideki 20 personelin durumu iyi olup makine dairesinde yoğun duman tespit edilmiştir. Süreç takip edilmektedir. Sent via Twitter for Android. View original tweet. It’s not the first time that ships linked to Moscow have suffered explosions this year. There was also a spate of blasts in the early months of 2025 that hit merchant ships with a history of calling at Russian ports.  It’s not yet known what happened to these vessels and, if they were attacked, who was responsible. Spain’s navy, which issues navigational warnings in the region, says there’s also a significant risk posed by floating mines in parts of the Black Sea since the conflict began. Kairos is a Suezmax-class vessel whose previous voyage was from the Russian port of Novorossiysk to Paradip in India, hauling Moscow’s flagship crude grade Urals. It was heading back to the Russian port to load its next cargo at the time of the incident,

Read More »

Oil Notches Fourth Monthly Drop

Oil posted a fourth monthly loss as traders looked ahead to an OPEC+ meeting this weekend and assessed how the potential of easing geopolitical tensions from Kyiv to Caracas may impact an oversupplied market. West Texas Intermediate edged down to settle below $59 a barrel, after earlier gaining as much as 1.7%, to close out the longest streak of monthly drops since March 2023. The commodity slid to intra-day lows minutes before settlement as The New York Times reported that US President Donald Trump and Venezuelan counterpart Nicolás Maduro discussed a potential meeting in a call last week. A de-escalation between the Trump administration and the oil-rich South American country would sap a major risk premium out of oil prices.  The late-day dip capped off a choppy trading session, marked by thin holiday volumes and an hours-long outage on Chicago Mercantile Exchange’s trading platform that roiled global markets. The halt — which the company said was a result of a cooling issue in a data center — also impacted gasoline and diesel futures that are due to expire on Friday.  OPEC+ nations are set to meet virtually on Sunday and will probably stick with a plan to pause output increases in early 2026, delegates said. With that decision locked in, a key focus may be a long-term review of members’ capacity. US oil has fallen 18% this year, with prices hurt by expectations for a global glut after OPEC+ restarted capacity, while drillers outside the alliance also added supplies.  On Ukraine, Russian President Vladimir Putin said that US President Donald Trump’s proposals for ending Moscow’s war could be the basis for future agreements and expressed an openness to talks, though sticking points that led to stalemates in previous rounds remain. US presidential envoy Steve Witkoff is expected to visit Moscow next week.  The

Read More »

CME Futures Outage Disrupts Trading

(Update) November 28, 2025, 11:00 AM GMT: Article updated. Trading of futures and options on the Chicago Mercantile Exchange was halted by a data-center fault, causing hours of disruption to markets across equities, foreign exchange, bonds and commodities. The malfunction was caused by cooling system problems at a data center in the Chicago area, according to facility operator CyrusOne. Engineering teams have restarted several chillers and deployed temporary cooling equipment, a spokesperson said, without giving a time for the resumption of normal operations.  The halt is already longer than a similar, hours-long outage due to a technical error back in 2019 and underscores the reach of CME Group Inc. and its Globex electronic trading platform. It triggered widespread frustration as market participants contemplated the prospect of a lost trading session. Millions of contracts tracking the S&P 500, Dow Jones Industrial Average and Nasdaq 100 trade every weekday virtually around the clock on the CME, one of the world’s largest derivatives exchanges. “It’s a bit like flying dark,” said Thomas Helaine, head of equity sales at TP ICAP Europe in Paris. “When you’re trading cash equity like us, US futures give you an indication of where the market is going before the open. I can only imagine how complicated it must be for derivatives desks.” The outage halted trading of US Treasury futures, while European and UK bond markets that trade on a different exchange were unaffected. EBS, a platform used in foreign exchange, was impacted, hurting price discovery in the market. For some traders, the timing of the disruption on Friday could cause particular inconvenience if it lasts, due to the need to roll positions from one monthly contract to another.  “Traders sitting with a position are certainly quite angry,” said Gnanasekar Thiagarajan, head of trading and hedging strategies at Kaleesuwari Intercontinental. Gold saw

Read More »

Petrobras Slumps After Unveiling $109B Spending Plan

Brazilian oil major Petrobras announced a 2% decrease in its next five-year investment plan to $109 billion, putting dividend payments in doubt at a time of lower oil prices. Shares fell. The state-controlled oil producer is caught between the government’s desire to grow the economy – especially ahead of a 2026 presidential election – and investors who demand high dividends and low debt. While Petrobras announced a regular dividend payout of at least $45 billion for the 2026-2030 period, similar to the previous plan, it didn’t commit to pay any extraordinary payouts to shareholders.  Petrobras shares slid as much as 3.4% in Sao Paulo on Friday, the largest intraday drop since August, while Brent prices were are slightly lower. “The absence of short-term capex optimization could result in single-digit dividend yields ,” Itau Unibanco Holding SA said in a note to clients. “This could be perceived as disappointing by investors.” Petroleo Brasileiro SA, as it is formally known, will direct $91 billion of the total capital expenditure to projects under implementation, of which $10 billion will still need budget confirmation subject to a financing analysis. The rest is still under analysis “with a lower degree of maturity,” it said in a filing on Thursday. The spending plan is being closely watched by investors as it has an important political dimension in Brazil. The company is a major source of cash for the federal budget. It is the first time Petrobras has reduced its five-year budget after President Luiz Inacio Lula da Silva took office in 2023.  The previous plan was based on an oil price assumption of $83 a barrel, while Brent crude is currently trading near $63.  Petrobras earmarked 71.6% of the 2026-2030 plan, or $78 billion, for exploration and production. That includes boosting output at its deep-water fields

Read More »

Sanctioned Tanker at Risk of Sinking After Blast

An oil tanker from Russia’s shadow fleet reportedly hit a mine and was at risk of sinking north of Turkey’s coastline, a local port agent said.  The vessel, called Kairos, suffered an explosion and a fire broke out on board. Turkey’s Directorate General for Maritime Affairs said an external impact caused the blaze aboard the 900-foot ship, and vessels have been dispatched to evacuate the 25 crew on board. It wasn’t carrying a cargo at the time.  The port agent report said the vessel could have hit a mine. Kairos is a Suezmax tanker that has been sanctioned by the UK and the EU for carrying Russian oil, but not by the US. Its previous voyage was from the Russian port of Novorossiysk to Paradip in India, hauling Urals crude. It was heading back to the Russian port to load its next cargo, according to vessel tracking data compiled by Bloomberg. There has been a persistent risk of vessels being hit by mines in the region since Russia’s war in Ukraine began, with a handful of ships suffering explosions as a result. Earlier this year, there were also a series of mystery explosions on board ships that have carried oil for Russia outside of the Black Sea. An email address and phone number listed on a maritime database as the vessel’s manager didn’t respond to requests for comment outside of normal business hours. The Bosphorus, a key trade artery for commodities including Russian oil from ports in the Black Sea, remains open. The Kairos sails under the flag of Gambia, the agent said. Rusya’nın Novoroski limanına seyreden boş KAIROS tankeri, kıyılarımızdan 28 mil açıkta, dışarıdan bir etkiyle yangın çıktığı ihbarı alınmış olup gemideki 25 personelin durumu iyidir, denizcilerin tahliyesi için bölgeye kurtarma unsurlarımız sevk edilmiş, süreç takip edilmektedir. pic.twitter.com/rVcHPXL4YC — DENİZCİLİK

Read More »

Microsoft loses two senior AI infrastructure leaders as data center pressures mount

Microsoft did not immediately respond to a request for comment. Microsoft’s constraints Analysts say the twin departures mark a significant setback for Microsoft at a critical moment in the AI data center race, with pressure mounting from both OpenAI’s model demands and Google’s infrastructure scale. “Losing some of the best professionals working on this challenge could set Microsoft back,” said Neil Shah, partner and co-founder at Counterpoint Research. “Solving the energy wall is not trivial, and there may have been friction or strategic differences that contributed to their decision to move on, especially if they saw an opportunity to make a broader impact and do so more lucratively at a company like Nvidia.” Even so, Microsoft has the depth and ecosystem strength to continue doubling down on AI data centers, said Prabhu Ram, VP for industry research at Cybermedia Research. According to Sanchit Gogia, chief analyst at Greyhound Research, the departures come at a sensitive moment because Microsoft is trying to expand its AI infrastructure faster than physical constraints allow. “The executives who have left were central to GPU cluster design, data center engineering, energy procurement, and the experimental power and cooling approaches Microsoft has been pursuing to support dense AI workloads,” Gogia said. “Their exit coincides with pressures the company has already acknowledged publicly. GPUs are arriving faster than the company can energize the facilities that will house them, and power availability has overtaken chip availability as the real bottleneck.”

Read More »

What is Edge AI? When the cloud isn’t close enough

Many edge devices can periodically send summarized or selected inference output data back to a central system for model retraining or refinement. That feedback loop helps the model improve over time while still keeping most decisions local. And to run efficiently on constrained edge hardware, the AI model is often pre-processed by techniques such as quantization (which reduces precision), pruning (which removes redundant parameters), or knowledge distillation (which trains a smaller model to mimic a larger one). These optimizations reduce the model’s memory, compute, and power demands so it can run more easily on an edge device. What technologies make edge AI possible? The concept of the “edge” always assumes that edge devices are less computationally powerful than data centers and cloud platforms. While that remains true, overall improvements in computational hardware have made today’s edge devices much more capable than those designed just a few years ago. In fact, a whole host of technological developments have come together to make edge AI a reality. Specialized hardware acceleration. Edge devices now ship with dedicated AI-accelerators (NPUs, TPUs, GPU cores) and system-on-chip units tailored for on-device inference. For example, companies like Arm have integrated AI-acceleration libraries into standard frameworks so models can run efficiently on Arm-based CPUs. Connectivity and data architecture. Edge AI often depends on durable, low-latency links (e.g., 5G, WiFi 6, LPWAN) and architectures that move compute closer to data. Merging edge nodes, gateways, and local servers means less reliance on distant clouds. And technologies like Kubernetes can provide a consistent management plane from the data center to remote locations. Deployment, orchestration, and model lifecycle tooling. Edge AI deployments must support model-update delivery, device and fleet monitoring, versioning, rollback and secure inference — especially when orchestrated across hundreds or thousands of locations. VMware, for instance, is offering traffic management

Read More »

Networks, AI, and metaversing

Our first, conservative, view says that AI’s network impact is largely confined to the data center, to connect clusters of GPU servers and the data they use as they crunch large language models. It’s all “horizontal” traffic; one TikTok challenge would generate way more traffic in the wide area. WAN costs won’t rise for you as an enterprise, and if you’re a carrier you won’t be carrying much new, so you don’t have much service revenue upside. If you don’t host AI on premises, you can pretty much dismiss its impact on your network. Contrast that with the radical metaverse view, our third view. Metaverses and AR/VR transform AI missions, and network services, from transaction processing to event processing, because the real world is a bunch of events pushing on you. They also let you visualize the way that process control models (digital twins) relate to the real world, which is critical if the processes you’re modeling involve human workers who rely on their visual sense. Could it be that the reason Meta is willing to spend on AI, is that the most credible application of AI, and the most impactful for networks, is the metaverse concept? In any event, this model of AI, by driving the users’ experiences and activities directly, demands significant edge connectivity, so you could expect it to have a major impact on network requirements. In fact, just dipping your toes into a metaverse could require a major up-front network upgrade. Networks carry traffic. Traffic is messages. More messages, more traffic, more infrastructure, more service revenue…you get the picture. Door number one, to the AI giant future, leads to nothing much in terms of messages. Door number three, metaverses and AR/VR, leads to a message, traffic, and network revolution. I’ll bet that most enterprises would doubt

Read More »

Microsoft’s Fairwater Atlanta and the Rise of the Distributed AI Supercomputer

Microsoft’s second Fairwater data center in Atlanta isn’t just “another big GPU shed.” It represents the other half of a deliberate architectural experiment: proving that two massive AI campuses, separated by roughly 700 miles, can operate as one coherent, distributed supercomputer. The Atlanta installation is the latest expression of Microsoft’s AI-first data center design: purpose-built for training and serving frontier models rather than supporting mixed cloud workloads. It links directly to the original Fairwater campus in Wisconsin, as well as to earlier generations of Azure AI supercomputers, through a dedicated AI WAN backbone that Microsoft describes as the foundation of a “planet-scale AI superfactory.” Inside a Fairwater Site: Preparing for Multi-Site Distribution Efficient multi-site training only works if each individual site behaves as a clean, well-structured unit. Microsoft’s intra-site design is deliberately simplified so that cross-site coordination has a predictable abstraction boundary—essential for treating multiple campuses as one distributed AI system. Each Fairwater installation presents itself as a single, flat, high-regularity cluster: Up to 72 NVIDIA Blackwell GPUs per rack, using GB200 NVL72 rack-scale systems. NVLink provides the ultra-low-latency, high-bandwidth scale-up fabric within the rack, while the Spectrum-X Ethernet stack handles scale-out. Each rack delivers roughly 1.8 TB/s of GPU-to-GPU bandwidth and exposes a multi-terabyte pooled memory space addressable via NVLink—critical for large-model sharding, activation checkpointing, and parallelism strategies. Racks feed into a two-tier Ethernet scale-out network offering 800 Gbps GPU-to-GPU connectivity with very low hop counts, engineered to scale to hundreds of thousands of GPUs without encountering the classic port-count and topology constraints of traditional Clos fabrics. Microsoft confirms that the fabric relies heavily on: SONiC-based switching and a broad commodity Ethernet ecosystem to avoid vendor lock-in and accelerate architectural iteration. Custom network optimizations, such as packet trimming, packet spray, high-frequency telemetry, and advanced congestion-control mechanisms, to prevent collective

Read More »

Land & Expand: Hyperscale, AI Factory, Megascale

Land & Expand is Data Center Frontier’s periodic roundup of notable North American data center development activity, tracking the newest sites, land plays, retrofits, and hyperscale campus expansions shaping the industry’s build cycle. October delivered a steady cadence of announcements, with several megascale projects advancing from concept to commitment. The month was defined by continued momentum in OpenAI and Oracle’s Stargate initiative (now spanning multiple U.S. regions) as well as major new investments from Google, Meta, DataBank, and emerging AI cloud players accelerating high-density reuse strategies. The result is a clearer picture of how the next wave of AI-first infrastructure is taking shape across the country. Google Begins $4B West Memphis Hyperscale Buildout Google formally broke ground on its $4 billion hyperscale campus in West Memphis, Arkansas, marking the company’s first data center in the state and the anchor for a new Mid-South operational hub. The project spans just over 1,000 acres, with initial site preparation and utility coordination already underway. Google and Entergy Arkansas confirmed a 600 MW solar generation partnership, structured to add dedicated renewable supply to the regional grid. As part of the launch, Google announced a $25 million Energy Impact Fund for local community affordability programs and energy-resilience improvements—an unusually early community-benefit commitment for a first-phase hyperscale project. Cooling specifics have not yet been made public. Water sourcing—whether reclaimed, potable, or hybrid seasonal mode—remains under review, as the company finalizes environmental permits. Public filings reference a large-scale onsite water treatment facility, similar to Google’s deployments in The Dalles and Council Bluffs. Local governance documents show that prior to the October announcement, West Memphis approved a 30-year PILOT via Groot LLC (Google’s land assembly entity), with early filings referencing a typical placeholder of ~50 direct jobs. At launch, officials emphasized hundreds of full-time operations roles and thousands

Read More »

The New Digital Infrastructure Geography: Green Street’s David Guarino on AI Demand, Power Scarcity, and the Next Phase of Data Center Growth

As the global data center industry races through its most frenetic build cycle in history, one question continues to define the market’s mood: is this the peak of an AI-fueled supercycle, or the beginning of a structurally different era for digital infrastructure? For Green Street Managing Director and Head of Global Data Center and Tower Research David Guarino, the answer—based firmly on observable fundamentals—is increasingly clear. Demand remains blisteringly strong. Capital appetite is deepening. And the very definition of a “data center market” is shifting beneath the industry’s feet. In a wide-ranging discussion with Data Center Frontier, Guarino outlined why data centers continue to stand out in the commercial real estate landscape, how AI is reshaping underwriting and development models, why behind-the-meter power is quietly reorganizing the U.S. map, and what Green Street sees ahead for rents, REITs, and the next wave of hyperscale expansion. A ‘Safe’ Asset in an Uncertain CRE Landscape Among institutional investors, the post-COVID era was the moment data centers stepped decisively out of “niche” territory. Guarino notes that pandemic-era reliance on digital services crystallized a structural recognition: data centers deliver stable, predictable cash flows, anchored by the highest-credit tenants in global real estate. Hyperscalers today dominate new leasing and routinely sign 15-year (or longer) contracts, a duration largely unmatched across CRE categories. When compared with one-year apartment leases, five-year office leases, or mall anchor terms, the stability story becomes plain. “These are AAA-caliber companies signing the longest leases in the sector’s history,” Guarino said. “From a real estate point of view, that combination of tenant quality and lease duration continues to position the asset class as uniquely durable.” And development returns remain exceptional. Even without assuming endless AI growth, the math works: strong demand, rising rents, and high-credit tenants create unusually predictable performance relative to

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »