Stay Ahead, Stay ONMINE

Overcome Failing Document Ingestion & RAG Strategies with Agentic Knowledge Distillation

Introduction Many generative AI use cases still revolve around Retrieval Augmented Generation (RAG), yet consistently fall short of user expectations. Despite the growing body of research on RAG improvements and even adding Agents into the process, many solutions still fail to return exhaustive results, miss information that is critical but infrequently mentioned in the documents, require multiple search iterations, and generally struggle to reconcile key themes across multiple documents. To top it all off, many implementations still rely on cramming as much “relevant” information as possible into the model’s context window alongside detailed system and user prompts. Reconciling all this information often exceeds the model’s cognitive capacity and compromises response quality and consistency. This is where our Agentic Knowledge Distillation + Pyramid Search Approach comes into play. Instead of chasing the best chunking strategy, retrieval algorithm, or inference-time reasoning method, my team, Jim Brown, Mason Sawtell, Sandi Besen, and I, take an agentic approach to document ingestion. We leverage the full capability of the model at ingestion time to focus exclusively on distilling and preserving the most meaningful information from the document dataset. This fundamentally simplifies the RAG process by allowing the model to direct its reasoning abilities toward addressing the user/system instructions rather than struggling to understand formatting and disparate information across document chunks.  We specifically target high-value questions that are often difficult to evaluate because they have multiple correct answers or solution paths. These cases are where traditional RAG solutions struggle most and existing RAG evaluation datasets are largely insufficient for testing this problem space. For our research implementation, we downloaded annual and quarterly reports from the last year for the 30 companies in the DOW Jones Industrial Average. These documents can be found through the SEC EDGAR website. The information on EDGAR is accessible and able to be downloaded for free or can be queried through EDGAR public searches. See the SEC privacy policy for additional details, information on the SEC website is “considered public information and may be copied or further distributed by users of the web site without the SEC’s permission”. We selected this dataset for two key reasons: first, it falls outside the knowledge cutoff for the models evaluated, ensuring that the models cannot respond to questions based on their knowledge from pre-training; second, it’s a close approximation for real-world business problems while allowing us to discuss and share our findings using publicly available data.  While typical RAG solutions excel at factual retrieval where the answer is easily identified in the document dataset (e.g., “When did Apple’s annual shareholder’s meeting occur?”), they struggle with nuanced questions that require a deeper understanding of concepts across documents (e.g., “Which of the DOW companies has the most promising AI strategy?”). Our Agentic Knowledge Distillation + Pyramid Search Approach addresses these types of questions with much greater success compared to other standard approaches we tested and overcomes limitations associated with using knowledge graphs in RAG systems.  In this article, we’ll cover how our knowledge distillation process works, key benefits of this approach, examples, and an open discussion on the best way to evaluate these types of systems where, in many cases, there is no singular “right” answer. Building the pyramid: How Agentic Knowledge Distillation works Image by author and team depicting pyramid structure for document ingestion. Robots meant to represent agents building the pyramid. Overview Our knowledge distillation process creates a multi-tiered pyramid of information from the raw source documents. Our approach is inspired by the pyramids used in deep learning computer vision-based tasks, which allow a model to analyze an image at multiple scales. We take the contents of the raw document, convert it to markdown, and distill the content into a list of atomic insights, related concepts, document abstracts, and general recollections/memories. During retrieval it’s possible to access any or all levels of the pyramid to respond to the user request.  How to distill documents and build the pyramid:  Convert documents to Markdown: Convert all raw source documents to Markdown. We’ve found models process markdown best for this task compared to other formats like JSON and it is more token efficient. We used Azure Document Intelligence to generate the markdown for each page of the document, but there are many other open-source libraries like MarkItDown which do the same thing. Our dataset included 331 documents and 16,601 pages.  Extract atomic insights from each page: We process documents using a two-page sliding window, which allows each page to be analyzed twice. This gives the agent the opportunity to correct any potential mistakes when processing the page initially. We instruct the model to create a numbered list of insights that grows as it processes the pages in the document. The agent can overwrite insights from the previous page if they were incorrect since it sees each page twice. We instruct the model to extract insights in simple sentences following the subject-verb-object (SVO) format and to write sentences as if English is the second language of the user. This significantly improves performance by encouraging clarity and precision. Rolling over each page multiple times and using the SVO format also solves the disambiguation problem, which is a huge challenge for knowledge graphs. The insight generation step is also particularly helpful for extracting information from tables since the model captures the facts from the table in clear, succinct sentences. Our dataset produced 216,931 total insights, about 13 insights per page and 655 insights per document. Distilling concepts from insights: From the detailed list of insights, we identify higher-level concepts that connect related information about the document. This step significantly reduces noise and redundant information in the document while preserving essential information and themes. Our dataset produced 14,824 total concepts, about 1 concept per page and 45 concepts per document.  Creating abstracts from concepts: Given the insights and concepts in the document, the LLM writes an abstract that appears both better than any abstract a human would write and more information-dense than any abstract present in the original document. The LLM generated abstract provides incredibly comprehensive knowledge about the document with a small token density that carries a significant amount of information. We produce one abstract per document, 331 total. Storing recollections/memories across documents: At the top of the pyramid we store critical information that is useful across all tasks. This can be information that the user shares about the task or information the agent learns about the dataset over time by researching and responding to tasks. For example, we can store the current 30 companies in the DOW as a recollection since this list is different from the 30 companies in the DOW at the time of the model’s knowledge cutoff. As we conduct more and more research tasks, we can continuously improve our recollections and maintain an audit trail of which documents these recollections originated from. For example, we can keep track of AI strategies across companies, where companies are making major investments, etc. These high-level connections are super important since they reveal relationships and information that are not apparent in a single page or document. Sample subset of insights extracted from IBM 10Q, Q3 2024 (page 4) We store the text and embeddings for each layer of the pyramid (pages and up) in Azure PostgreSQL. We originally used Azure AI Search, but switched to PostgreSQL for cost reasons. This required us to write our own hybrid search function since PostgreSQL doesn’t yet natively support this feature. This implementation would work with any vector database or vector index of your choosing. The key requirement is to store and efficiently retrieve both text and vector embeddings at any level of the pyramid.  This approach essentially creates the essence of a knowledge graph, but stores information in natural language, the way an LLM natively wants to interact with it, and is more efficient on token retrieval. We also let the LLM pick the terms used to categorize each level of the pyramid, this seemed to let the model decide for itself the best way to describe and differentiate between the information stored at each level. For example, the LLM preferred “insights” to “facts” as the label for the first level of distilled knowledge. Our goal in doing this was to better understand how an LLM thinks about the process by letting it decide how to store and group related information.  Using the pyramid: How it works with RAG & Agents At inference time, both traditional RAG and agentic approaches benefit from the pre-processed, distilled information ingested in our knowledge pyramid. The pyramid structure allows for efficient retrieval in both the traditional RAG case, where only the top X related pieces of information are retrieved or in the Agentic case, where the Agent iteratively plans, retrieves, and evaluates information before returning a final response.  The benefit of the pyramid approach is that information at any and all levels of the pyramid can be used during inference. For our implementation, we used PydanticAI to create a search agent that takes in the user request, generates search terms, explores ideas related to the request, and keeps track of information relevant to the request. Once the search agent determines there’s sufficient information to address the user request, the results are re-ranked and sent back to the LLM to generate a final reply. Our implementation allows a search agent to traverse the information in the pyramid as it gathers details about a concept/search term. This is similar to walking a knowledge graph, but in a way that’s more natural for the LLM since all the information in the pyramid is stored in natural language. Depending on the use case, the Agent could access information at all levels of the pyramid or only at specific levels (e.g. only retrieve information from the concepts). For our experiments, we did not retrieve raw page-level data since we wanted to focus on token efficiency and found the LLM-generated information for the insights, concepts, abstracts, and recollections was sufficient for completing our tasks. In theory, the Agent could also have access to the page data; this would provide additional opportunities for the agent to re-examine the original document text; however, it would also significantly increase the total tokens used.  Here is a high-level visualization of our Agentic approach to responding to user requests: Image created by author and team providing an overview of the agentic research & response process Results from the pyramid: Real-world examples To evaluate the effectiveness of our approach, we tested it against a variety of question categories, including typical fact-finding questions and complex cross-document research and analysis tasks.  Fact-finding (spear fishing):  These tasks require identifying specific information or facts that are buried in a document. These are the types of questions typical RAG solutions target but often require many searches and consume lots of tokens to answer correctly.  Example task: “What was IBM’s total revenue in the latest financial reporting?” Example response using pyramid approach: “IBM’s total revenue for the third quarter of 2024 was $14.968 billion [ibm-10q-q3-2024.pdf, pg. 4] Total tokens used to research and generate response This result is correct (human-validated) and was generated using only 9,994 total tokens, with 1,240 tokens in the generated final response.  Complex research and analysis:  These tasks involve researching and understanding multiple concepts to gain a broader understanding of the documents and make inferences and informed assumptions based on the gathered facts. Example task: “Analyze the investments Microsoft and NVIDIA are making in AI and how they are positioning themselves in the market. The report should be clearly formatted.” Example response: Response generated by the agent analyzing AI investments and positioning for Microsoft and NVIDIA. The result is a comprehensive report that executed quickly and contains detailed information about each of the companies. 26,802 total tokens were used to research and respond to the request with a significant percentage of them used for the final response (2,893 tokens or ~11%). These results were also reviewed by a human to verify their validity. Snippet indicating total token usage for the task Example task: “Create a report on analyzing the risks disclosed by the various financial companies in the DOW. Indicate which risks are shared and unique.” Example response: Part 1 of response generated by the agent on disclosed risks. Part 2 of response generated by the agent on disclosed risks. Similarly, this task was completed in 42.7 seconds and used 31,685 total tokens, with 3,116 tokens used to generate the final report.  Snippet indicating total token usage for the task These results for both fact-finding and complex analysis tasks demonstrate that the pyramid approach efficiently creates detailed reports with low latency using a minimal amount of tokens. The tokens used for the tasks carry dense meaning with little noise allowing for high-quality, thorough responses across tasks. Benefits of the pyramid: Why use it? Overall, we found that our pyramid approach provided a significant boost in response quality and overall performance for high-value questions.  Some of the key benefits we observed include:  Reduced model’s cognitive load: When the agent receives the user task, it retrieves pre-processed, distilled information rather than the raw, inconsistently formatted, disparate document chunks. This fundamentally improves the retrieval process since the model doesn’t waste its cognitive capacity on trying to break down the page/chunk text for the first time.  Superior table processing: By breaking down table information and storing it in concise but descriptive sentences, the pyramid approach makes it easier to retrieve relevant information at inference time through natural language queries. This was particularly important for our dataset since financial reports contain lots of critical information in tables.  Improved response quality to many types of requests: The pyramid enables more comprehensive context-aware responses to both precise, fact-finding questions and broad analysis based tasks that involve many themes across numerous documents.  Preservation of critical context: Since the distillation process identifies and keeps track of key facts, important information that might appear only once in the document is easier to maintain. For example, noting that all tables are represented in millions of dollars or in a particular currency. Traditional chunking methods often cause this type of information to slip through the cracks.  Optimized token usage, memory, and speed: By distilling information at ingestion time, we significantly reduce the number of tokens required during inference, are able to maximize the value of information put in the context window, and improve memory use.  Scalability: Many solutions struggle to perform as the size of the document dataset grows. This approach provides a much more efficient way to manage a large volume of text by only preserving critical information. This also allows for a more efficient use of the LLMs context window by only sending it useful, clear information. Efficient concept exploration: The pyramid enables the agent to explore related information similar to navigating a knowledge graph, but does not require ever generating or maintaining relationships in the graph. The agent can use natural language exclusively and keep track of important facts related to the concepts it’s exploring in a highly token-efficient and fluid way.  Emergent dataset understanding: An unexpected benefit of this approach emerged during our testing. When asking questions like “what can you tell me about this dataset?” or “what types of questions can I ask?”, the system is able to respond and suggest productive search topics because it has a more robust understanding of the dataset context by accessing higher levels in the pyramid like the abstracts and recollections.  Beyond the pyramid: Evaluation challenges & future directions Challenges While the results we’ve observed when using the pyramid search approach have been nothing short of amazing, finding ways to establish meaningful metrics to evaluate the entire system both at ingestion time and during information retrieval is challenging. Traditional RAG and Agent evaluation frameworks often fail to address nuanced questions and analytical responses where many different responses are valid. Our team plans to write a research paper on this approach in the future, and we are open to any thoughts and feedback from the community, especially when it comes to evaluation metrics. Many of the existing datasets we found were focused on evaluating RAG use cases within one document or precise information retrieval across multiple documents rather than robust concept and theme analysis across documents and domains.  The main use cases we are interested in relate to broader questions that are representative of how businesses actually want to interact with GenAI systems. For example, “tell me everything I need to know about customer X” or “how do the behaviors of Customer A and B differ? Which am I more likely to have a successful meeting with?”. These types of questions require a deep understanding of information across many sources. The answers to these questions typically require a person to synthesize data from multiple areas of the business and think critically about it. As a result, the answers to these questions are rarely written or saved anywhere which makes it impossible to simply store and retrieve them through a vector index in a typical RAG process.  Another consideration is that many real-world use cases involve dynamic datasets where documents are consistently being added, edited, and deleted. This makes it difficult to evaluate and track what a “correct” response is since the answer will evolve as the available information changes.  Future directions In the future, we believe that the pyramid approach can address some of these challenges by enabling more effective processing of dense documents and storing learned information as recollections. However, tracking and evaluating the validity of the recollections over time will be critical to the system’s overall success and remains a key focus area for our ongoing work.  When applying this approach to organizational data, the pyramid process could also be used to identify and assess discrepancies across areas of the business. For example, uploading all of a company’s sales pitch decks could surface where certain products or services are being positioned inconsistently. It could also be used to compare insights extracted from various line of business data to help understand if and where teams have developed conflicting understandings of topics or different priorities. This application goes beyond pure information retrieval use cases and would allow the pyramid to serve as an organizational alignment tool that helps identify divergences in messaging, terminology, and overall communication.  Conclusion: Key takeaways and why the pyramid approach matters The knowledge distillation pyramid approach is significant because it leverages the full power of the LLM at both ingestion and retrieval time. Our approach allows you to store dense information in fewer tokens which has the added benefit of reducing noise in the dataset at inference. Our approach also runs very quickly and is incredibly token efficient, we are able to generate responses within seconds, explore potentially hundreds of searches, and on average use

Introduction

Many generative AI use cases still revolve around Retrieval Augmented Generation (RAG), yet consistently fall short of user expectations. Despite the growing body of research on RAG improvements and even adding Agents into the process, many solutions still fail to return exhaustive results, miss information that is critical but infrequently mentioned in the documents, require multiple search iterations, and generally struggle to reconcile key themes across multiple documents. To top it all off, many implementations still rely on cramming as much “relevant” information as possible into the model’s context window alongside detailed system and user prompts. Reconciling all this information often exceeds the model’s cognitive capacity and compromises response quality and consistency.

This is where our Agentic Knowledge Distillation + Pyramid Search Approach comes into play. Instead of chasing the best chunking strategy, retrieval algorithm, or inference-time reasoning method, my team, Jim Brown, Mason Sawtell, Sandi Besen, and I, take an agentic approach to document ingestion.

We leverage the full capability of the model at ingestion time to focus exclusively on distilling and preserving the most meaningful information from the document dataset. This fundamentally simplifies the RAG process by allowing the model to direct its reasoning abilities toward addressing the user/system instructions rather than struggling to understand formatting and disparate information across document chunks. 

We specifically target high-value questions that are often difficult to evaluate because they have multiple correct answers or solution paths. These cases are where traditional RAG solutions struggle most and existing RAG evaluation datasets are largely insufficient for testing this problem space. For our research implementation, we downloaded annual and quarterly reports from the last year for the 30 companies in the DOW Jones Industrial Average. These documents can be found through the SEC EDGAR website. The information on EDGAR is accessible and able to be downloaded for free or can be queried through EDGAR public searches. See the SEC privacy policy for additional details, information on the SEC website is “considered public information and may be copied or further distributed by users of the web site without the SEC’s permission”. We selected this dataset for two key reasons: first, it falls outside the knowledge cutoff for the models evaluated, ensuring that the models cannot respond to questions based on their knowledge from pre-training; second, it’s a close approximation for real-world business problems while allowing us to discuss and share our findings using publicly available data. 

While typical RAG solutions excel at factual retrieval where the answer is easily identified in the document dataset (e.g., “When did Apple’s annual shareholder’s meeting occur?”), they struggle with nuanced questions that require a deeper understanding of concepts across documents (e.g., “Which of the DOW companies has the most promising AI strategy?”). Our Agentic Knowledge Distillation + Pyramid Search Approach addresses these types of questions with much greater success compared to other standard approaches we tested and overcomes limitations associated with using knowledge graphs in RAG systems. 

In this article, we’ll cover how our knowledge distillation process works, key benefits of this approach, examples, and an open discussion on the best way to evaluate these types of systems where, in many cases, there is no singular “right” answer.

Building the pyramid: How Agentic Knowledge Distillation works

AI-generated image showing a pyramid structure for document ingestion with labelled sections.
Image by author and team depicting pyramid structure for document ingestion. Robots meant to represent agents building the pyramid.

Overview

Our knowledge distillation process creates a multi-tiered pyramid of information from the raw source documents. Our approach is inspired by the pyramids used in deep learning computer vision-based tasks, which allow a model to analyze an image at multiple scales. We take the contents of the raw document, convert it to markdown, and distill the content into a list of atomic insights, related concepts, document abstracts, and general recollections/memories. During retrieval it’s possible to access any or all levels of the pyramid to respond to the user request. 

How to distill documents and build the pyramid: 

  1. Convert documents to Markdown: Convert all raw source documents to Markdown. We’ve found models process markdown best for this task compared to other formats like JSON and it is more token efficient. We used Azure Document Intelligence to generate the markdown for each page of the document, but there are many other open-source libraries like MarkItDown which do the same thing. Our dataset included 331 documents and 16,601 pages. 
  2. Extract atomic insights from each page: We process documents using a two-page sliding window, which allows each page to be analyzed twice. This gives the agent the opportunity to correct any potential mistakes when processing the page initially. We instruct the model to create a numbered list of insights that grows as it processes the pages in the document. The agent can overwrite insights from the previous page if they were incorrect since it sees each page twice. We instruct the model to extract insights in simple sentences following the subject-verb-object (SVO) format and to write sentences as if English is the second language of the user. This significantly improves performance by encouraging clarity and precision. Rolling over each page multiple times and using the SVO format also solves the disambiguation problem, which is a huge challenge for knowledge graphs. The insight generation step is also particularly helpful for extracting information from tables since the model captures the facts from the table in clear, succinct sentences. Our dataset produced 216,931 total insights, about 13 insights per page and 655 insights per document.
  3. Distilling concepts from insights: From the detailed list of insights, we identify higher-level concepts that connect related information about the document. This step significantly reduces noise and redundant information in the document while preserving essential information and themes. Our dataset produced 14,824 total concepts, about 1 concept per page and 45 concepts per document. 
  4. Creating abstracts from concepts: Given the insights and concepts in the document, the LLM writes an abstract that appears both better than any abstract a human would write and more information-dense than any abstract present in the original document. The LLM generated abstract provides incredibly comprehensive knowledge about the document with a small token density that carries a significant amount of information. We produce one abstract per document, 331 total.
  5. Storing recollections/memories across documents: At the top of the pyramid we store critical information that is useful across all tasks. This can be information that the user shares about the task or information the agent learns about the dataset over time by researching and responding to tasks. For example, we can store the current 30 companies in the DOW as a recollection since this list is different from the 30 companies in the DOW at the time of the model’s knowledge cutoff. As we conduct more and more research tasks, we can continuously improve our recollections and maintain an audit trail of which documents these recollections originated from. For example, we can keep track of AI strategies across companies, where companies are making major investments, etc. These high-level connections are super important since they reveal relationships and information that are not apparent in a single page or document.
Sample subset of insights extracted from IBM 10Q, Q3 2024
Sample subset of insights extracted from IBM 10Q, Q3 2024 (page 4)

We store the text and embeddings for each layer of the pyramid (pages and up) in Azure PostgreSQL. We originally used Azure AI Search, but switched to PostgreSQL for cost reasons. This required us to write our own hybrid search function since PostgreSQL doesn’t yet natively support this feature. This implementation would work with any vector database or vector index of your choosing. The key requirement is to store and efficiently retrieve both text and vector embeddings at any level of the pyramid. 

This approach essentially creates the essence of a knowledge graph, but stores information in natural language, the way an LLM natively wants to interact with it, and is more efficient on token retrieval. We also let the LLM pick the terms used to categorize each level of the pyramid, this seemed to let the model decide for itself the best way to describe and differentiate between the information stored at each level. For example, the LLM preferred “insights” to “facts” as the label for the first level of distilled knowledge. Our goal in doing this was to better understand how an LLM thinks about the process by letting it decide how to store and group related information. 

Using the pyramid: How it works with RAG & Agents

At inference time, both traditional RAG and agentic approaches benefit from the pre-processed, distilled information ingested in our knowledge pyramid. The pyramid structure allows for efficient retrieval in both the traditional RAG case, where only the top X related pieces of information are retrieved or in the Agentic case, where the Agent iteratively plans, retrieves, and evaluates information before returning a final response. 

The benefit of the pyramid approach is that information at any and all levels of the pyramid can be used during inference. For our implementation, we used PydanticAI to create a search agent that takes in the user request, generates search terms, explores ideas related to the request, and keeps track of information relevant to the request. Once the search agent determines there’s sufficient information to address the user request, the results are re-ranked and sent back to the LLM to generate a final reply. Our implementation allows a search agent to traverse the information in the pyramid as it gathers details about a concept/search term. This is similar to walking a knowledge graph, but in a way that’s more natural for the LLM since all the information in the pyramid is stored in natural language.

Depending on the use case, the Agent could access information at all levels of the pyramid or only at specific levels (e.g. only retrieve information from the concepts). For our experiments, we did not retrieve raw page-level data since we wanted to focus on token efficiency and found the LLM-generated information for the insights, concepts, abstracts, and recollections was sufficient for completing our tasks. In theory, the Agent could also have access to the page data; this would provide additional opportunities for the agent to re-examine the original document text; however, it would also significantly increase the total tokens used. 

Here is a high-level visualization of our Agentic approach to responding to user requests:

Overview of the agentic research & response process
Image created by author and team providing an overview of the agentic research & response process

Results from the pyramid: Real-world examples

To evaluate the effectiveness of our approach, we tested it against a variety of question categories, including typical fact-finding questions and complex cross-document research and analysis tasks. 

Fact-finding (spear fishing): 

These tasks require identifying specific information or facts that are buried in a document. These are the types of questions typical RAG solutions target but often require many searches and consume lots of tokens to answer correctly. 

Example task: “What was IBM’s total revenue in the latest financial reporting?”

Example response using pyramid approach: “IBM’s total revenue for the third quarter of 2024 was $14.968 billion [ibm-10q-q3-2024.pdf, pg. 4]

Screenshot of total tokens used to research and generate response
Total tokens used to research and generate response

This result is correct (human-validated) and was generated using only 9,994 total tokens, with 1,240 tokens in the generated final response. 

Complex research and analysis: 

These tasks involve researching and understanding multiple concepts to gain a broader understanding of the documents and make inferences and informed assumptions based on the gathered facts.

Example task: “Analyze the investments Microsoft and NVIDIA are making in AI and how they are positioning themselves in the market. The report should be clearly formatted.”

Example response:

Screenshot of the response generated by the agent analyzing AI investments and positioning for Microsoft and NVIDIA.
Response generated by the agent analyzing AI investments and positioning for Microsoft and NVIDIA.

The result is a comprehensive report that executed quickly and contains detailed information about each of the companies. 26,802 total tokens were used to research and respond to the request with a significant percentage of them used for the final response (2,893 tokens or ~11%). These results were also reviewed by a human to verify their validity.

Screenshot of snippet indicating total token usage for the task
Snippet indicating total token usage for the task

Example task: “Create a report on analyzing the risks disclosed by the various financial companies in the DOW. Indicate which risks are shared and unique.”

Example response:

Screenshot of part 1 of a response generated by the agent on disclosed risks.
Part 1 of response generated by the agent on disclosed risks.
Screenshot of part 2 of a response generated by the agent on disclosed risks.
Part 2 of response generated by the agent on disclosed risks.

Similarly, this task was completed in 42.7 seconds and used 31,685 total tokens, with 3,116 tokens used to generate the final report. 

Screenshot of a snippet indicating total token usage for the task
Snippet indicating total token usage for the task

These results for both fact-finding and complex analysis tasks demonstrate that the pyramid approach efficiently creates detailed reports with low latency using a minimal amount of tokens. The tokens used for the tasks carry dense meaning with little noise allowing for high-quality, thorough responses across tasks.

Benefits of the pyramid: Why use it?

Overall, we found that our pyramid approach provided a significant boost in response quality and overall performance for high-value questions. 

Some of the key benefits we observed include: 

  • Reduced model’s cognitive load: When the agent receives the user task, it retrieves pre-processed, distilled information rather than the raw, inconsistently formatted, disparate document chunks. This fundamentally improves the retrieval process since the model doesn’t waste its cognitive capacity on trying to break down the page/chunk text for the first time. 
  • Superior table processing: By breaking down table information and storing it in concise but descriptive sentences, the pyramid approach makes it easier to retrieve relevant information at inference time through natural language queries. This was particularly important for our dataset since financial reports contain lots of critical information in tables. 
  • Improved response quality to many types of requests: The pyramid enables more comprehensive context-aware responses to both precise, fact-finding questions and broad analysis based tasks that involve many themes across numerous documents. 
  • Preservation of critical context: Since the distillation process identifies and keeps track of key facts, important information that might appear only once in the document is easier to maintain. For example, noting that all tables are represented in millions of dollars or in a particular currency. Traditional chunking methods often cause this type of information to slip through the cracks. 
  • Optimized token usage, memory, and speed: By distilling information at ingestion time, we significantly reduce the number of tokens required during inference, are able to maximize the value of information put in the context window, and improve memory use. 
  • Scalability: Many solutions struggle to perform as the size of the document dataset grows. This approach provides a much more efficient way to manage a large volume of text by only preserving critical information. This also allows for a more efficient use of the LLMs context window by only sending it useful, clear information.
  • Efficient concept exploration: The pyramid enables the agent to explore related information similar to navigating a knowledge graph, but does not require ever generating or maintaining relationships in the graph. The agent can use natural language exclusively and keep track of important facts related to the concepts it’s exploring in a highly token-efficient and fluid way. 
  • Emergent dataset understanding: An unexpected benefit of this approach emerged during our testing. When asking questions like “what can you tell me about this dataset?” or “what types of questions can I ask?”, the system is able to respond and suggest productive search topics because it has a more robust understanding of the dataset context by accessing higher levels in the pyramid like the abstracts and recollections. 

Beyond the pyramid: Evaluation challenges & future directions

Challenges

While the results we’ve observed when using the pyramid search approach have been nothing short of amazing, finding ways to establish meaningful metrics to evaluate the entire system both at ingestion time and during information retrieval is challenging. Traditional RAG and Agent evaluation frameworks often fail to address nuanced questions and analytical responses where many different responses are valid.

Our team plans to write a research paper on this approach in the future, and we are open to any thoughts and feedback from the community, especially when it comes to evaluation metrics. Many of the existing datasets we found were focused on evaluating RAG use cases within one document or precise information retrieval across multiple documents rather than robust concept and theme analysis across documents and domains. 

The main use cases we are interested in relate to broader questions that are representative of how businesses actually want to interact with GenAI systems. For example, “tell me everything I need to know about customer X” or “how do the behaviors of Customer A and B differ? Which am I more likely to have a successful meeting with?”. These types of questions require a deep understanding of information across many sources. The answers to these questions typically require a person to synthesize data from multiple areas of the business and think critically about it. As a result, the answers to these questions are rarely written or saved anywhere which makes it impossible to simply store and retrieve them through a vector index in a typical RAG process. 

Another consideration is that many real-world use cases involve dynamic datasets where documents are consistently being added, edited, and deleted. This makes it difficult to evaluate and track what a “correct” response is since the answer will evolve as the available information changes. 

Future directions

In the future, we believe that the pyramid approach can address some of these challenges by enabling more effective processing of dense documents and storing learned information as recollections. However, tracking and evaluating the validity of the recollections over time will be critical to the system’s overall success and remains a key focus area for our ongoing work. 

When applying this approach to organizational data, the pyramid process could also be used to identify and assess discrepancies across areas of the business. For example, uploading all of a company’s sales pitch decks could surface where certain products or services are being positioned inconsistently. It could also be used to compare insights extracted from various line of business data to help understand if and where teams have developed conflicting understandings of topics or different priorities. This application goes beyond pure information retrieval use cases and would allow the pyramid to serve as an organizational alignment tool that helps identify divergences in messaging, terminology, and overall communication. 

Conclusion: Key takeaways and why the pyramid approach matters

The knowledge distillation pyramid approach is significant because it leverages the full power of the LLM at both ingestion and retrieval time. Our approach allows you to store dense information in fewer tokens which has the added benefit of reducing noise in the dataset at inference. Our approach also runs very quickly and is incredibly token efficient, we are able to generate responses within seconds, explore potentially hundreds of searches, and on average use (this includes all the search iterations!). 

We find that the LLM is much better at writing atomic insights as sentences and that these insights effectively distill information from both text-based and tabular data. This distilled information written in natural language is very easy for the LLM to understand and navigate at inference since it does not have to expend unnecessary energy reasoning about and breaking down document formatting or filtering through noise

The ability to retrieve and aggregate information at any level of the pyramid also provides significant flexibility to address a variety of query types. This approach offers promising performance for large datasets and enables high-value use cases that require nuanced information retrieval and analysis. 


Note: The opinions expressed in this article are solely my own and do not necessarily reflect the views or policies of my employer.

Interested in discussing further or collaborating? Reach out on LinkedIn!

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Agentic AI: What now, what next?

Agentic AI burst onto the scene with its promises of streamliningoperations and accelerating productivity. But what’s real and what’s hype when it comes to deploying agentic AI? This Special Report examines the state of agentic AI, the challenges organizations are facing in deploying it, and the lessons learned from success

Read More »

AMD to build two more supercomputers at Oak Ridge National Labs

Lux is engineered to train, refine, and deploy AI foundation models that accelerate scientific and engineering progress. Its advanced architecture supports data-intensive and model-centric workloads, thereby enhancing AI-driven research capabilities. Discovery differs from Lux in that it uses Instinct MI430X GPUs instead of the 300 series. The MI400 Series is

Read More »

Iberdrola Raises Stake in Brazil’s Neoenergia

Banco do Brasil’s pension fund has sold its stake in Neoenergia SA to Iberdrola SA for EUR 1.92 billion ($2.22 billion). The acquisition amounted to 30.29 percent of Neoenergia’s capital, Spanish renewables-focused utility Iberdrola said in a statement on its website. It said the purchase grows its stake in Neoenergia to about 84 percent. “Neoenergia supplies electricity to nearly 40 million Brazilians through five distributors (in the states of Bahia, Rio Grande do Norte, Pernambuco, Sao Paulo, Mato Grosso do Sul and Brasilia) and 18 transmission lines, making it the country’s leading distribution group in terms of number of customers”, Iberdrola said. “Neoenergia has more than 725,000 kilometers of distribution lines and 8,000 kilometers of transmission lines and has 3,800 MW of renewable generation, mainly hydroelectric. “With this operation, Iberdrola reaffirms its commitment to Brazil and takes a new step in its growth strategy based on the electricity grid business, in which it has 1.4 million kilometers of lines in the United States, the United Kingdom, Brazil and Spain”. Across Brazil, Neoenergia has an installed capacity of 3.9 gigawatts through 44 wind farms, four hydro plants and the two solar plants, Neoenergia says on its website. For the third quarter, Neoenergia reported BRL 924 million ($171.52 million) in profit and BRL 2.8 billion in EBITDA, up 10 percent and 13 percent year-on-year respectively. “In the year to September, Neoenergia achieved BRL 7.6 billion in capex, of which BRL 4.8 billion was concentrated in distribution, which represents 31 percent more investment in this business compared to the same period last year”, it said in its quarterly report October 27. Iberdrola has allotted Brazil EUR 7 billion in its 2024-28 global investment plan of EUR 58 billion, of which 65 percent is for regulated networks. The previous plan for the period was EUR 41 billion. Distribution has been

Read More »

Samsung to Build New Qatar Carbon Capture Project

QatarEnergy has awarded Samsung C&T Corp the engineering, procurement and construction contract for a carbon capture and storage (CCS) project that will serve existing natural gas liquefaction facilities in Ras Laffan Industrial City. “The new project will capture and sequester up to 4.1 million tons of CO2 per annum, making it one of the world’s largest of its kind and placing Qatar at the forefront of global large-scale carbon capture deployment, reinforcing its leadership role in providing responsible and sustainable energy”, state-owned integrated energy company QatarEnergy said in a press release. It said it had “launched” its first CCS project, with a capacity of 2.2 million metric tons per annum (MTPA), in 2019. “Two other ongoing CCS projects will serve the North Field East and North Field South expansion projects, capturing and storing 2.1 MTPA and 1.2 MTPA of CO2 respectively”, QatarEnergy added. QatarEnergy president and chief executive Saad Sherida Al-Kaabi, who is also Qatar’s energy minister, said, “All our LNG expansion projects will deploy CCS technologies, with an aim to capture over 11 MTPA of CO2 by 2035.” QatarEnergy aims to double its liquefied natural gas (LNG) production capacity to 160 MMtpa through the North Field expansion projects in Qatar and Golden Pass LNG in Texas. The United States project will begin production by year-end, Al-Kaabi told the World Gas Conference in Beijing earlier this year. The first liquefaction train from the North Field east expansion project will start production by mid-2026. “As for North Field West, it is in the engineering phase and will be going into the construction phase somewhere in 2027”, Al-Kaabi said then. “QatarEnergy will be the largest single LNG exporter as a company while Qatar, as a country, will be the second-largest exporter of LNG after the United States for a very long time”,

Read More »

Hungary Prepares Bill For Fuel Emergencies

Hungary has drafted legislation on steps to be taken in case of a fuel-supply emergency, just days after a major fire at the country’s sole oil refinery and following a US decision to impose sanctions on Russian energy companies. The bill designates so-called emergency fuel stations, regulated by the government, if there’s a significant fuel-supply disruption, according to the text of the measure posted on the Energy Ministry’s website. Prime Minister Viktor Orban’s government has maintained that the country’s fuel supply is assured, even as Mol Nyrt. has yet to disclose the extent to which production was affected at its Danube refinery following a blast and fire more than a week ago. Erste Bank initially estimated that as much as 40% of the refinery’s output was affected. Orban on Thursday said that authorities are still investigating the cause of the fire. Energy-security risks have been compounded by the US decision last week to sanction major Russian oil producers in a bid to pressure Russian President Vladimir Putin to the negotiating table to end his war on Ukraine. Unlike many of its European Union neighbors, Hungary has ramped up purchases of Russian energy following Moscow’s full-scale invasion of Ukraine in 2022 and now gets almost all of its crude oil imports from Russia. Orban will meet Donald Trump at the White House on Nov. 7, where he’ll seek to convince the US president to grant an exemption from sanctions to allow Hungary to continue purchasing oil and natural gas from Russia, Cabinet Minister Gergely Gulyas told reporters on Thursday. WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed.

Read More »

Xcel Energy rolls out $60 billion capital spending plan

By the Numbers: Xcel Energy Q3 2025 $524M Quarterly earnings, down 23% from 2024 on higher depreciation, interest charges and O&M expenses, partially offset by improved recovery from infrastructure investments. 3 GW Capacity of contracted or ‘high probability’ data center load. The utility says it is tracking additional deals that could exceed 20 GW in new load. $60B Five-year capital spending plan Accelerated Growth The Minneapolis-based utility serves about 3.9 million electric customers in parts of Colorado, Michigan, Minnesota, New Mexico, North Dakota, South Dakota, Texas and Wisconsin. The company expects retail sales to grow 5% through 2030. A 3 GW pipeline of contracted and “high probability” data center projects will drive the majority of that growth, according to the company. Leaders believe Xcel’s data center queue could exceed 20 GW if earlier-stage prospects materialize. Xcel Energy announced a $15 billion addition to its five-year capital plan on Thursday, which CEO Bob Frenzel said will now cover 7.5 GW of new renewable generation, 3 GW of new gas generation, 1.9 GW of energy storage, 1,500 miles of high-voltage transmission and $5 billion for wildfire mitigation. Xcel and two telecom companies agreed to a $640 million settlement with plaintiffs in a lawsuit over the 2021-2022 Marshall Fire in Colorado in September. The company excluded a $290 million charge from its share of the Marshall Wildfire settlement in Colorado from quarterly earnings metrics, it noted. Xcel’s long-term vision includes the addition of 4.5 GW of new natural gas capacity as well as 5 GW of energy storage, Frenzel said. “Making sure that we can deliver a cleaner energy product as well as a highly reliable and highly affordable product is very strategic as we approach economic development with data centers,” Frenzel said. New data center load represents about 60% of Xcel’s anticipated retail sales growth through

Read More »

Lukoil to Sell Assets to Gunvor Amid Sanctions

Russian oil producer Lukoil PJSC has agreed to sell its international assets to energy trader Gunvor Group, a week after being hit by US sanctions. The country’s No. 2 oil producer said it had accepted an offer from Gunvor and made a commitment not to negotiate with other potential buyers. If successful, the deal would involve the transfer of a sprawling global network of oil fields, refineries and gas stations to one of the world’s top independent commodity traders.  The US last week blacklisted oil giants Rosneft PJSC and Lukoil as part of a fresh bid to end the war in Ukraine by depriving Moscow of revenues. It was the first major package of sanctions on Russia’s petroleum industry since US President Donald Trump took office, and has left governments and business partners clambering to understand the impact. The offer — for which no value was disclosed — includes Lukoil International’s trading arm Litasco, but not the business units in Dubai which have recently become subject to sanctions, said a person familiar with the matter. Gunvor itself has had a long history with Russia. Its co-founder Gennady Timchenko was placed under US sanctions in the wake of the Kremlin’s annexation of Crimea in 2014, with the US government claiming at the time that Russian President Vladimir Putin had “investments in Gunvor,” which the company has consistently denied.  Since Timchenko sold his shares, it’s now majority-owned by co-founder and chief executive officer Torbjorn Tornqvist.  After making record profits from recent volatility in energy markets, cash-rich commodity traders are spending big on assets to help lock in better margins for the future. A potential deal could provide Gunvor with a system of upstream and downstream businesses akin to the trading units of majors like BP Plc and Shell Plc. The deal is subject to

Read More »

Energy Department Announces $100 Million to Restore America’s Coal Plants

WASHINGTON— The U.S. Department of Energy (DOE) today issued a Notice of Funding Opportunity (NOFO) for up to $100 million in federal funding to refurbish and modernize the nation’s existing coal power plants. It follows the Department’s September announcement of its intent to invest $625 million to expand and reinvigorate America’s coal industry. The effort will support practical, high-impact projects that improve efficiency, plant lifetimes, and performance of coal and natural gas use. “For years, the Biden and Obama administrations relentlessly targeted America’s coal industry and workers, resulting in the closure of reliable power plants and higher electricity costs,” said U.S. Secretary of Energy Chris Wright. “Thankfully, President Trump has ended the war on American coal and is restoring common sense energy policies that put Americans first. These projects will help keep America’s coal plants operating and ensure the United States has the reliable and affordable power it needs to keep the lights on and power our future.” This effort supports President Trump’s Executive Orders, Reinvigorating America’s Beautiful Clean Coal Industry and Strengthening the Reliability and Security of the United States Electric Grid, and advances his commitment to restore U.S. energy dominance. This NOFO seeks applications for projects to design, implement, test, and validate three strategic opportunities for refurbishment and retrofit of existing American coal power plants to make them operate more efficiently, reliably, and affordably: Development, engineering, and implementation of advanced wastewater management systems capable of cost-effective water recovery and other value-added byproducts from wastewater streams. Engineering, design, and implementation of retrofit systems that enable fuel switching between coal and natural gas without compromising critical operational parameters. Deployment, engineering, and implementation of advanced coal-natural gas co-firing systems and system components, including highly fuel-flexible burner designs and advanced control systems, to maximize gas co-firing capacity to provide a low cost retrofit option for coal plants while minimizing efficiency penalties. DOE’s National Energy

Read More »

Supermicro Unveils Data Center Building Blocks to Accelerate AI Factory Deployment

Supermicro has introduced a new business line, Data Center Building Block Solutions (DCBBS), expanding its modular approach to data center development. The offering packages servers, storage, liquid-cooling infrastructure, networking, power shelves and battery backup units (BBUs), DCIM and automation software, and on-site services into pre-validated, factory-tested bundles designed to accelerate time-to-online (TTO) and improve long-term serviceability. This move represents a significant step beyond traditional rack integration; a shift toward a one-stop, data-center-scale platform aimed squarely at the hyperscale and AI factory market. By providing a single point of accountability across IT, power, and thermal domains, Supermicro’s model enables faster deployments and reduces integration risk—the modern equivalent of a “single throat to choke” for data center operators racing to bring GB200/NVL72-class racks online. What’s New in DCBBS DCBBS extends Supermicro’s modular design philosophy to an integrated catalog of facility-adjacent building blocks, not just IT nodes. By including critical supporting infrastructure—cooling, power, networking, and lifecycle software—the platform helps operators bring new capacity online more quickly and predictably. According to Supermicro, DCBBS encompasses: Multi-vendor AI system support: Compatibility with NVIDIA, AMD, and Intel architectures, featuring Supermicro-designed cold plates that dissipate up to 98% of component-level heat. In-rack liquid-cooling designs: Coolant distribution manifolds (CDMs) and CDUs rated up to 250 kW, supporting 45 °C liquids, alongside rear-door heat exchangers, 800 GbE switches (51.2 Tb/s), 33 kW power shelves, and 48 V battery backup units. Liquid-to-Air (L2A) sidecars: Each row can reject up to 200 kW of heat without modifying existing building hydronics—an especially practical design for air-to-liquid retrofits. Automation and management software: SuperCloud Composer for rack-scale and liquid-cooling lifecycle management SuperCloud Automation Center for firmware, OS, Kubernetes, and AI pipeline enablement Developer Experience Console for self-service workflows and orchestration End-to-end services: Design, validation, and on-site deployment options—including four-hour response service levels—for both greenfield builds

Read More »

Investments Anchor Vertiv’s Growth Strategy as AI-Driven Data Center Orders Surge 60% YoY

New Acquisitions and Partner Awards Vertiv’s third-quarter financial performance was underscored by a series of strategic acquisitions and ecosystem recognitions that expand the company’s technological capabilities and market reach amid AI-driven demand. Acquisition of Waylay NV: AI and Hyperautomation for Infrastructure Intelligence On August 26, Vertiv announced its acquisition of Waylay NV, a Belgium-based developer of generative AI and hyperautomation software. The move bolsters Vertiv’s portfolio with AI-driven monitoring, predictive services, and performance optimization for digital infrastructure. Waylay’s automation platform integrates real-time analytics, orchestration, and workflow automation across diverse connected assets and cloud services—enabling predictive maintenance, uptime optimization, and energy management across power and cooling systems. “With the addition of Waylay’s technology and software-focused team, Vertiv will accelerate its vision of intelligent infrastructure—data-driven, proactive, and optimized for the world’s most demanding environments,” said CEO Giordano Albertazzi. Completion of Great Lakes Acquisition: Expanding White Space Integration Just days earlier, as alluded to above, Vertiv finalized its $200 million acquisition of Great Lakes Data Racks & Cabinets, a U.S.-based manufacturer of enclosures and integrated rack systems. The addition expands Vertiv’s capabilities in high-density, factory-integrated white space solutions; bridging power, cooling, and IT enclosures for hyperscale and edge data centers alike. Great Lakes’ U.S. and European manufacturing footprint complements Vertiv’s global reach, supporting faster deployment cycles and expanded configuration flexibility.  Albertazzi noted that the acquisition “enhances our ability to deliver comprehensive infrastructure solutions, furthering Vertiv’s capabilities to customize at scale and configure at speed for AI and high-density computing environments.” 2024 Partner Awards: Recognizing the Ecosystem Behind Growth Vertiv also spotlighted its partner ecosystem in August with its 2024 North America Partner Awards. The company recognized 11 partners for 2024 performance, growth, and AI execution across segments: Partner of the Year – SHI for launching a customer-facing high-density AI & Cyber Labs featuring

Read More »

QuEra’s Quantum Leap: From Neutral-Atom Breakthroughs to Hybrid HPC Integration

The race to make quantum computing practical – and commercially consequential – took a major step forward this fall, as Boston-based QuEra Computing announced new research milestones, expanded strategic funding, and an accelerating roadmap for hybrid quantum-classical supercomputing. QuEra’s Chief Commercial Officer Yuval Boger joined the Data Center Frontier Show to discuss how neutral-atom quantum systems are moving from research labs into high-performance computing centers and cloud environments worldwide. NVIDIA Joins Google in Backing QuEra’s $230 Million Round In early September, QuEra disclosed that NVentures, NVIDIA’s venture arm, has joined Google and others in expanding its $230 million Series B round. The investment deepens what has already been one of the most active collaborations between quantum and accelerated-computing companies. “We already work with NVIDIA, pairing our scalable neutral-atom architecture with its accelerated-computing stack to speed the arrival of useful, fault-tolerant quantum machines,” said QuEra CEO Andy Ory. “The decision to invest in us underscores our shared belief that hybrid quantum-classical systems will unlock meaningful value for customers sooner than many expect.” The partnership spans hardware, software, and go-to-market initiatives. QuEra’s neutral-atom machines are being integrated into NVIDIA’s CUDA-Q software platform for hybrid workloads, while the two companies collaborate at the NVIDIA Accelerated Quantum Center (NVAQC) in Boston, linking QuEra hardware with NVIDIA’s GB200 NVL72 GPU clusters for simulation and quantum-error-decoder research. Meanwhile, at Japan’s AIST ABCI-Q supercomputing center, QuEra’s Gemini-class quantum computer now operates beside more than 2,000 H100 GPUs, serving as a national testbed for hybrid workflows. A jointly developed transformer-based decoder running on NVIDIA’s GPUs has already outperformed classical maximum-likelihood error-correction models, marking a concrete step toward practical fault-tolerant quantum computing. For NVIDIA, the move signals conviction that quantum processing units (QPUs) will one day complement GPUs inside large-scale data centers. For QuEra, it widens access to the

Read More »

How CoreWeave and Poolside Are Teaming Up in West Texas to Build the Next Generation of AI Data Centers

In the evolving landscape of artificial-intelligence infrastructure, a singular truth is emerging: access to cutting-edge silicon and massive GPU clusters is no longer enough by itself. For companies chasing the frontier of multi-trillion-parameter model training and agentic AI deployment, the bottleneck increasingly lies not just in compute, but in the seamless integration of compute + power + data center scale. The latest chapter in this story is the collaboration between CoreWeave and Poolside, culminating in the launch of Project Horizon, a 2-gigawatt AI-campus build in West Texas. Setting the Stage: Who’s Involved, and Why It Matters CoreWeave (NASDAQ: CRWV) has positioned itself as “The Essential Cloud for AI™” — a company founded in 2017, publicly listed in March 2025, and aggressively building out its footprint of ultra-high-performance infrastructure.  One of its strategic moves: in July 2025 CoreWeave struck a definitive agreement to acquire Core Scientific (NASDAQ: CORZ) in an all-stock transaction. Through that deal, CoreWeave gains grip over approximately 1.3 GW of gross power across Core Scientific’s nationwide data center footprint, plus more than 1 GW of expansion potential.  That acquisition underlines a broader trend: AI-specialist clouds are no longer renting space and power; they’re working to own or tightly control it. Poolside, founded in 2023, is a foundation-model company with an ambitious mission: building artificial general intelligence (AGI) and deploying enterprise-scale agents.  According to Poolside’s blog: “When people ask what it takes to build frontier AI … the focus is usually on the model … but that’s only half the story. The other half is infrastructure. If you don’t control your infrastructure, you don’t control your destiny—and you don’t have a shot at the frontier.”  Simply put: if you’re chasing multi-trillion-parameter models, you need both the compute horsepower and the power infrastructure; and ideally, tight vertical integration. Together, the

Read More »

Vantage Data Centers Pours $15B Into Wisconsin AI Campus as It Builds Global Giga-Scale Footprint

Expanding in Ohio: Financing Growth Through Green Capital In June 2025, Vantage secured $5 billion in green loan capacity, including $2.25 billion to fully fund its New Albany, Ohio (OH1) campus and expand its existing borrowing base. The 192 MW development will comprise three 64 MW buildings, with first delivery expected in December 2025 and phased completion through 2028. The OH1 campus is designed to come online as Vantage’s larger megasites ramp up, providing early capacity and regional proximity to major cloud and AI customers in the Columbus–New Albany corridor. The site also offers logistical and workforce advantages within one of the fastest-growing data center regions in the U.S. Beyond the U.S. – Vantage Expands Its Global Footprint Moving North: Reinforcing Canada’s Renewable Advantage In February 2025, Vantage announced a C$500 million investment to complete QC24, the fourth and final building at its Québec City campus, adding 32 MW of capacity by 2027. The project strengthens Vantage’s Montreal–Québec platform and reinforces its renewable-heavy power profile, leveraging abundant hydropower to serve sustainability-driven customers. APAC Expansion: Strategic Scale in Southeast Asia In September 2025, Vantage unveiled a $1.6 billion APAC expansion, led by existing investors GIC (Singapore’s sovereign wealth fund) and ADIA (Abu Dhabi Investment Authority). The investment includes the acquisition of Yondr’s Johor, Malaysia campus at Sedenak Tech Park. Currently delivering 72.5 MW, the Johor campus is planned to scale to 300 MW at full build-out, positioning it within one of Southeast Asia’s most active AI and cloud growth corridors. Analysts note that the location’s connectivity to Singapore’s hyperscale market and favorable development economics give Vantage a strong competitive foothold across the region. Italy: Expanding European Presence Under National Priority Status Vantage is also adding a second Italian campus alongside its existing Milan site, totaling 32 MW across two facilities. Phase

Read More »

Nvidia GTC show news you need to know round-up

In the case of Flex, it will use digital twins to unify inventory, labor, and freight operations, streamlining logistics across Flex’s worldwide network. Flex’s new 400,000 sq. ft. facility in Dallas is purpose-built for data center infrastructure, aiming to significantly shorten lead times for U.S. customers. The Flex/Nvidia partnership aims to address the country’s labor shortages and drive innovation in manufacturing, pharmaceuticals, and technology. The companies believe the partnership sets the stage for a new era of giga-scale AI factories. Nvidia and Oracle to Build DOE’s Largest AI Supercomputer Oracle continues its aggressive push into supercomputing with a deal to build the largest AI supercomputer for scientific discovery — Using Nvidia GPUs, obviously — at a Department of Energy facility. The system, dubbed Solstice, will feature an incredible 100,000 Nvidia Blackwell GPUs. A second system, dubbed Equinox, will include 10,000 Blackwell GPUs and is expected to be available in the first half of 2026. Both systems will be interconnected by Nvidia networking and deliver a combined 2,200 exaflops of AI performance. The Solstice and Equinox supercomputers will be located at Argonne National Laboratory, the home to the Aurora supercomputer, built using all Intel parts. They will enable scientists and researchers to develop and train new frontier models and AI reasoning models for open science using the Nvidia Megatron-Core library and scale them using the Nvidia TensorRT inference software stack.

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »