Overcome Failing Document Ingestion & RAG Strategies with Agentic Knowledge Distillation

Stay Ahead, Stay ONMINE

Overcome Failing Document Ingestion & RAG Strategies with Agentic Knowledge Distillation

Introduction Many generative AI use cases still revolve around Retrieval Augmented Generation (RAG), yet consistently fall short of user expectations. Despite the growing body of research on RAG improvements and even adding Agents into the process, many solutions still fail to return exhaustive results, miss information that is critical but infrequently mentioned in the documents, require multiple search iterations, and generally struggle to reconcile key themes across multiple documents. To top it all off, many implementations still rely on cramming as much “relevant” information as possible into the model’s context window alongside detailed system and user prompts. Reconciling all this information often exceeds the model’s cognitive capacity and compromises response quality and consistency. This is where our Agentic Knowledge Distillation + Pyramid Search Approach comes into play. Instead of chasing the best chunking strategy, retrieval algorithm, or inference-time reasoning method, my team, Jim Brown, Mason Sawtell, Sandi Besen, and I, take an agentic approach to document ingestion. We leverage the full capability of the model at ingestion time to focus exclusively on distilling and preserving the most meaningful information from the document dataset. This fundamentally simplifies the RAG process by allowing the model to direct its reasoning abilities toward addressing the user/system instructions rather than struggling to understand formatting and disparate information across document chunks. We specifically target high-value questions that are often difficult to evaluate because they have multiple correct answers or solution paths. These cases are where traditional RAG solutions struggle most and existing RAG evaluation datasets are largely insufficient for testing this problem space. For our research implementation, we downloaded annual and quarterly reports from the last year for the 30 companies in the DOW Jones Industrial Average. These documents can be found through the SEC EDGAR website. The information on EDGAR is accessible and able to be downloaded for free or can be queried through EDGAR public searches. See the SEC privacy policy for additional details, information on the SEC website is “considered public information and may be copied or further distributed by users of the web site without the SEC’s permission”. We selected this dataset for two key reasons: first, it falls outside the knowledge cutoff for the models evaluated, ensuring that the models cannot respond to questions based on their knowledge from pre-training; second, it’s a close approximation for real-world business problems while allowing us to discuss and share our findings using publicly available data. While typical RAG solutions excel at factual retrieval where the answer is easily identified in the document dataset (e.g., “When did Apple’s annual shareholder’s meeting occur?”), they struggle with nuanced questions that require a deeper understanding of concepts across documents (e.g., “Which of the DOW companies has the most promising AI strategy?”). Our Agentic Knowledge Distillation + Pyramid Search Approach addresses these types of questions with much greater success compared to other standard approaches we tested and overcomes limitations associated with using knowledge graphs in RAG systems. In this article, we’ll cover how our knowledge distillation process works, key benefits of this approach, examples, and an open discussion on the best way to evaluate these types of systems where, in many cases, there is no singular “right” answer. Building the pyramid: How Agentic Knowledge Distillation works Image by author and team depicting pyramid structure for document ingestion. Robots meant to represent agents building the pyramid. Overview Our knowledge distillation process creates a multi-tiered pyramid of information from the raw source documents. Our approach is inspired by the pyramids used in deep learning computer vision-based tasks, which allow a model to analyze an image at multiple scales. We take the contents of the raw document, convert it to markdown, and distill the content into a list of atomic insights, related concepts, document abstracts, and general recollections/memories. During retrieval it’s possible to access any or all levels of the pyramid to respond to the user request. How to distill documents and build the pyramid: Convert documents to Markdown: Convert all raw source documents to Markdown. We’ve found models process markdown best for this task compared to other formats like JSON and it is more token efficient. We used Azure Document Intelligence to generate the markdown for each page of the document, but there are many other open-source libraries like MarkItDown which do the same thing. Our dataset included 331 documents and 16,601 pages. Extract atomic insights from each page: We process documents using a two-page sliding window, which allows each page to be analyzed twice. This gives the agent the opportunity to correct any potential mistakes when processing the page initially. We instruct the model to create a numbered list of insights that grows as it processes the pages in the document. The agent can overwrite insights from the previous page if they were incorrect since it sees each page twice. We instruct the model to extract insights in simple sentences following the subject-verb-object (SVO) format and to write sentences as if English is the second language of the user. This significantly improves performance by encouraging clarity and precision. Rolling over each page multiple times and using the SVO format also solves the disambiguation problem, which is a huge challenge for knowledge graphs. The insight generation step is also particularly helpful for extracting information from tables since the model captures the facts from the table in clear, succinct sentences. Our dataset produced 216,931 total insights, about 13 insights per page and 655 insights per document. Distilling concepts from insights: From the detailed list of insights, we identify higher-level concepts that connect related information about the document. This step significantly reduces noise and redundant information in the document while preserving essential information and themes. Our dataset produced 14,824 total concepts, about 1 concept per page and 45 concepts per document. Creating abstracts from concepts: Given the insights and concepts in the document, the LLM writes an abstract that appears both better than any abstract a human would write and more information-dense than any abstract present in the original document. The LLM generated abstract provides incredibly comprehensive knowledge about the document with a small token density that carries a significant amount of information. We produce one abstract per document, 331 total. Storing recollections/memories across documents: At the top of the pyramid we store critical information that is useful across all tasks. This can be information that the user shares about the task or information the agent learns about the dataset over time by researching and responding to tasks. For example, we can store the current 30 companies in the DOW as a recollection since this list is different from the 30 companies in the DOW at the time of the model’s knowledge cutoff. As we conduct more and more research tasks, we can continuously improve our recollections and maintain an audit trail of which documents these recollections originated from. For example, we can keep track of AI strategies across companies, where companies are making major investments, etc. These high-level connections are super important since they reveal relationships and information that are not apparent in a single page or document. Sample subset of insights extracted from IBM 10Q, Q3 2024 (page 4) We store the text and embeddings for each layer of the pyramid (pages and up) in Azure PostgreSQL. We originally used Azure AI Search, but switched to PostgreSQL for cost reasons. This required us to write our own hybrid search function since PostgreSQL doesn’t yet natively support this feature. This implementation would work with any vector database or vector index of your choosing. The key requirement is to store and efficiently retrieve both text and vector embeddings at any level of the pyramid. This approach essentially creates the essence of a knowledge graph, but stores information in natural language, the way an LLM natively wants to interact with it, and is more efficient on token retrieval. We also let the LLM pick the terms used to categorize each level of the pyramid, this seemed to let the model decide for itself the best way to describe and differentiate between the information stored at each level. For example, the LLM preferred “insights” to “facts” as the label for the first level of distilled knowledge. Our goal in doing this was to better understand how an LLM thinks about the process by letting it decide how to store and group related information. Using the pyramid: How it works with RAG & Agents At inference time, both traditional RAG and agentic approaches benefit from the pre-processed, distilled information ingested in our knowledge pyramid. The pyramid structure allows for efficient retrieval in both the traditional RAG case, where only the top X related pieces of information are retrieved or in the Agentic case, where the Agent iteratively plans, retrieves, and evaluates information before returning a final response. The benefit of the pyramid approach is that information at any and all levels of the pyramid can be used during inference. For our implementation, we used PydanticAI to create a search agent that takes in the user request, generates search terms, explores ideas related to the request, and keeps track of information relevant to the request. Once the search agent determines there’s sufficient information to address the user request, the results are re-ranked and sent back to the LLM to generate a final reply. Our implementation allows a search agent to traverse the information in the pyramid as it gathers details about a concept/search term. This is similar to walking a knowledge graph, but in a way that’s more natural for the LLM since all the information in the pyramid is stored in natural language. Depending on the use case, the Agent could access information at all levels of the pyramid or only at specific levels (e.g. only retrieve information from the concepts). For our experiments, we did not retrieve raw page-level data since we wanted to focus on token efficiency and found the LLM-generated information for the insights, concepts, abstracts, and recollections was sufficient for completing our tasks. In theory, the Agent could also have access to the page data; this would provide additional opportunities for the agent to re-examine the original document text; however, it would also significantly increase the total tokens used. Here is a high-level visualization of our Agentic approach to responding to user requests: Image created by author and team providing an overview of the agentic research & response process Results from the pyramid: Real-world examples To evaluate the effectiveness of our approach, we tested it against a variety of question categories, including typical fact-finding questions and complex cross-document research and analysis tasks. Fact-finding (spear fishing): These tasks require identifying specific information or facts that are buried in a document. These are the types of questions typical RAG solutions target but often require many searches and consume lots of tokens to answer correctly. Example task: “What was IBM’s total revenue in the latest financial reporting?” Example response using pyramid approach: “IBM’s total revenue for the third quarter of 2024 was $14.968 billion [ibm-10q-q3-2024.pdf, pg. 4] Total tokens used to research and generate response This result is correct (human-validated) and was generated using only 9,994 total tokens, with 1,240 tokens in the generated final response. Complex research and analysis: These tasks involve researching and understanding multiple concepts to gain a broader understanding of the documents and make inferences and informed assumptions based on the gathered facts. Example task: “Analyze the investments Microsoft and NVIDIA are making in AI and how they are positioning themselves in the market. The report should be clearly formatted.” Example response: Response generated by the agent analyzing AI investments and positioning for Microsoft and NVIDIA. The result is a comprehensive report that executed quickly and contains detailed information about each of the companies. 26,802 total tokens were used to research and respond to the request with a significant percentage of them used for the final response (2,893 tokens or ~11%). These results were also reviewed by a human to verify their validity. Snippet indicating total token usage for the task Example task: “Create a report on analyzing the risks disclosed by the various financial companies in the DOW. Indicate which risks are shared and unique.” Example response: Part 1 of response generated by the agent on disclosed risks. Part 2 of response generated by the agent on disclosed risks. Similarly, this task was completed in 42.7 seconds and used 31,685 total tokens, with 3,116 tokens used to generate the final report. Snippet indicating total token usage for the task These results for both fact-finding and complex analysis tasks demonstrate that the pyramid approach efficiently creates detailed reports with low latency using a minimal amount of tokens. The tokens used for the tasks carry dense meaning with little noise allowing for high-quality, thorough responses across tasks. Benefits of the pyramid: Why use it? Overall, we found that our pyramid approach provided a significant boost in response quality and overall performance for high-value questions. Some of the key benefits we observed include: Reduced model’s cognitive load: When the agent receives the user task, it retrieves pre-processed, distilled information rather than the raw, inconsistently formatted, disparate document chunks. This fundamentally improves the retrieval process since the model doesn’t waste its cognitive capacity on trying to break down the page/chunk text for the first time. Superior table processing: By breaking down table information and storing it in concise but descriptive sentences, the pyramid approach makes it easier to retrieve relevant information at inference time through natural language queries. This was particularly important for our dataset since financial reports contain lots of critical information in tables. Improved response quality to many types of requests: The pyramid enables more comprehensive context-aware responses to both precise, fact-finding questions and broad analysis based tasks that involve many themes across numerous documents. Preservation of critical context: Since the distillation process identifies and keeps track of key facts, important information that might appear only once in the document is easier to maintain. For example, noting that all tables are represented in millions of dollars or in a particular currency. Traditional chunking methods often cause this type of information to slip through the cracks. Optimized token usage, memory, and speed: By distilling information at ingestion time, we significantly reduce the number of tokens required during inference, are able to maximize the value of information put in the context window, and improve memory use. Scalability: Many solutions struggle to perform as the size of the document dataset grows. This approach provides a much more efficient way to manage a large volume of text by only preserving critical information. This also allows for a more efficient use of the LLMs context window by only sending it useful, clear information. Efficient concept exploration: The pyramid enables the agent to explore related information similar to navigating a knowledge graph, but does not require ever generating or maintaining relationships in the graph. The agent can use natural language exclusively and keep track of important facts related to the concepts it’s exploring in a highly token-efficient and fluid way. Emergent dataset understanding: An unexpected benefit of this approach emerged during our testing. When asking questions like “what can you tell me about this dataset?” or “what types of questions can I ask?”, the system is able to respond and suggest productive search topics because it has a more robust understanding of the dataset context by accessing higher levels in the pyramid like the abstracts and recollections. Beyond the pyramid: Evaluation challenges & future directions Challenges While the results we’ve observed when using the pyramid search approach have been nothing short of amazing, finding ways to establish meaningful metrics to evaluate the entire system both at ingestion time and during information retrieval is challenging. Traditional RAG and Agent evaluation frameworks often fail to address nuanced questions and analytical responses where many different responses are valid. Our team plans to write a research paper on this approach in the future, and we are open to any thoughts and feedback from the community, especially when it comes to evaluation metrics. Many of the existing datasets we found were focused on evaluating RAG use cases within one document or precise information retrieval across multiple documents rather than robust concept and theme analysis across documents and domains. The main use cases we are interested in relate to broader questions that are representative of how businesses actually want to interact with GenAI systems. For example, “tell me everything I need to know about customer X” or “how do the behaviors of Customer A and B differ? Which am I more likely to have a successful meeting with?”. These types of questions require a deep understanding of information across many sources. The answers to these questions typically require a person to synthesize data from multiple areas of the business and think critically about it. As a result, the answers to these questions are rarely written or saved anywhere which makes it impossible to simply store and retrieve them through a vector index in a typical RAG process. Another consideration is that many real-world use cases involve dynamic datasets where documents are consistently being added, edited, and deleted. This makes it difficult to evaluate and track what a “correct” response is since the answer will evolve as the available information changes. Future directions In the future, we believe that the pyramid approach can address some of these challenges by enabling more effective processing of dense documents and storing learned information as recollections. However, tracking and evaluating the validity of the recollections over time will be critical to the system’s overall success and remains a key focus area for our ongoing work. When applying this approach to organizational data, the pyramid process could also be used to identify and assess discrepancies across areas of the business. For example, uploading all of a company’s sales pitch decks could surface where certain products or services are being positioned inconsistently. It could also be used to compare insights extracted from various line of business data to help understand if and where teams have developed conflicting understandings of topics or different priorities. This application goes beyond pure information retrieval use cases and would allow the pyramid to serve as an organizational alignment tool that helps identify divergences in messaging, terminology, and overall communication. Conclusion: Key takeaways and why the pyramid approach matters The knowledge distillation pyramid approach is significant because it leverages the full power of the LLM at both ingestion and retrieval time. Our approach allows you to store dense information in fewer tokens which has the added benefit of reducing noise in the dataset at inference. Our approach also runs very quickly and is incredibly token efficient, we are able to generate responses within seconds, explore potentially hundreds of searches, and on average use

Introduction

Many generative AI use cases still revolve around Retrieval Augmented Generation (RAG), yet consistently fall short of user expectations. Despite the growing body of research on RAG improvements and even adding Agents into the process, many solutions still fail to return exhaustive results, miss information that is critical but infrequently mentioned in the documents, require multiple search iterations, and generally struggle to reconcile key themes across multiple documents. To top it all off, many implementations still rely on cramming as much “relevant” information as possible into the model’s context window alongside detailed system and user prompts. Reconciling all this information often exceeds the model’s cognitive capacity and compromises response quality and consistency.

This is where our Agentic Knowledge Distillation + Pyramid Search Approach comes into play. Instead of chasing the best chunking strategy, retrieval algorithm, or inference-time reasoning method, my team, Jim Brown, Mason Sawtell, Sandi Besen, and I, take an agentic approach to document ingestion.

We leverage the full capability of the model at ingestion time to focus exclusively on distilling and preserving the most meaningful information from the document dataset. This fundamentally simplifies the RAG process by allowing the model to direct its reasoning abilities toward addressing the user/system instructions rather than struggling to understand formatting and disparate information across document chunks.

We specifically target high-value questions that are often difficult to evaluate because they have multiple correct answers or solution paths. These cases are where traditional RAG solutions struggle most and existing RAG evaluation datasets are largely insufficient for testing this problem space. For our research implementation, we downloaded annual and quarterly reports from the last year for the 30 companies in the DOW Jones Industrial Average. These documents can be found through the SEC EDGAR website. The information on EDGAR is accessible and able to be downloaded for free or can be queried through EDGAR public searches. See the SEC privacy policy for additional details, information on the SEC website is “considered public information and may be copied or further distributed by users of the web site without the SEC’s permission”. We selected this dataset for two key reasons: first, it falls outside the knowledge cutoff for the models evaluated, ensuring that the models cannot respond to questions based on their knowledge from pre-training; second, it’s a close approximation for real-world business problems while allowing us to discuss and share our findings using publicly available data.

While typical RAG solutions excel at factual retrieval where the answer is easily identified in the document dataset (e.g., “When did Apple’s annual shareholder’s meeting occur?”), they struggle with nuanced questions that require a deeper understanding of concepts across documents (e.g., “Which of the DOW companies has the most promising AI strategy?”). Our Agentic Knowledge Distillation + Pyramid Search Approach addresses these types of questions with much greater success compared to other standard approaches we tested and overcomes limitations associated with using knowledge graphs in RAG systems.

In this article, we’ll cover how our knowledge distillation process works, key benefits of this approach, examples, and an open discussion on the best way to evaluate these types of systems where, in many cases, there is no singular “right” answer.

Building the pyramid: How Agentic Knowledge Distillation works

AI-generated image showing a pyramid structure for document ingestion with labelled sections. — Image by author and team depicting pyramid structure for document ingestion. Robots meant to represent agents building the pyramid.

Overview

Our knowledge distillation process creates a multi-tiered pyramid of information from the raw source documents. Our approach is inspired by the pyramids used in deep learning computer vision-based tasks, which allow a model to analyze an image at multiple scales. We take the contents of the raw document, convert it to markdown, and distill the content into a list of atomic insights, related concepts, document abstracts, and general recollections/memories. During retrieval it’s possible to access any or all levels of the pyramid to respond to the user request.

How to distill documents and build the pyramid:

Convert documents to Markdown: Convert all raw source documents to Markdown. We’ve found models process markdown best for this task compared to other formats like JSON and it is more token efficient. We used Azure Document Intelligence to generate the markdown for each page of the document, but there are many other open-source libraries like MarkItDown which do the same thing. Our dataset included 331 documents and 16,601 pages.
Extract atomic insights from each page: We process documents using a two-page sliding window, which allows each page to be analyzed twice. This gives the agent the opportunity to correct any potential mistakes when processing the page initially. We instruct the model to create a numbered list of insights that grows as it processes the pages in the document. The agent can overwrite insights from the previous page if they were incorrect since it sees each page twice. We instruct the model to extract insights in simple sentences following the subject-verb-object (SVO) format and to write sentences as if English is the second language of the user. This significantly improves performance by encouraging clarity and precision. Rolling over each page multiple times and using the SVO format also solves the disambiguation problem, which is a huge challenge for knowledge graphs. The insight generation step is also particularly helpful for extracting information from tables since the model captures the facts from the table in clear, succinct sentences. Our dataset produced 216,931 total insights, about 13 insights per page and 655 insights per document.
Distilling concepts from insights: From the detailed list of insights, we identify higher-level concepts that connect related information about the document. This step significantly reduces noise and redundant information in the document while preserving essential information and themes. Our dataset produced 14,824 total concepts, about 1 concept per page and 45 concepts per document.
Creating abstracts from concepts: Given the insights and concepts in the document, the LLM writes an abstract that appears both better than any abstract a human would write and more information-dense than any abstract present in the original document. The LLM generated abstract provides incredibly comprehensive knowledge about the document with a small token density that carries a significant amount of information. We produce one abstract per document, 331 total.
Storing recollections/memories across documents: At the top of the pyramid we store critical information that is useful across all tasks. This can be information that the user shares about the task or information the agent learns about the dataset over time by researching and responding to tasks. For example, we can store the current 30 companies in the DOW as a recollection since this list is different from the 30 companies in the DOW at the time of the model’s knowledge cutoff. As we conduct more and more research tasks, we can continuously improve our recollections and maintain an audit trail of which documents these recollections originated from. For example, we can keep track of AI strategies across companies, where companies are making major investments, etc. These high-level connections are super important since they reveal relationships and information that are not apparent in a single page or document.

Sample subset of insights extracted from IBM 10Q, Q3 2024 (page 4)

We store the text and embeddings for each layer of the pyramid (pages and up) in Azure PostgreSQL. We originally used Azure AI Search, but switched to PostgreSQL for cost reasons. This required us to write our own hybrid search function since PostgreSQL doesn’t yet natively support this feature. This implementation would work with any vector database or vector index of your choosing. The key requirement is to store and efficiently retrieve both text and vector embeddings at any level of the pyramid.

This approach essentially creates the essence of a knowledge graph, but stores information in natural language, the way an LLM natively wants to interact with it, and is more efficient on token retrieval. We also let the LLM pick the terms used to categorize each level of the pyramid, this seemed to let the model decide for itself the best way to describe and differentiate between the information stored at each level. For example, the LLM preferred “insights” to “facts” as the label for the first level of distilled knowledge. Our goal in doing this was to better understand how an LLM thinks about the process by letting it decide how to store and group related information.

Using the pyramid: How it works with RAG & Agents

At inference time, both traditional RAG and agentic approaches benefit from the pre-processed, distilled information ingested in our knowledge pyramid. The pyramid structure allows for efficient retrieval in both the traditional RAG case, where only the top X related pieces of information are retrieved or in the Agentic case, where the Agent iteratively plans, retrieves, and evaluates information before returning a final response.

The benefit of the pyramid approach is that information at any and all levels of the pyramid can be used during inference. For our implementation, we used PydanticAI to create a search agent that takes in the user request, generates search terms, explores ideas related to the request, and keeps track of information relevant to the request. Once the search agent determines there’s sufficient information to address the user request, the results are re-ranked and sent back to the LLM to generate a final reply. Our implementation allows a search agent to traverse the information in the pyramid as it gathers details about a concept/search term. This is similar to walking a knowledge graph, but in a way that’s more natural for the LLM since all the information in the pyramid is stored in natural language.

Depending on the use case, the Agent could access information at all levels of the pyramid or only at specific levels (e.g. only retrieve information from the concepts). For our experiments, we did not retrieve raw page-level data since we wanted to focus on token efficiency and found the LLM-generated information for the insights, concepts, abstracts, and recollections was sufficient for completing our tasks. In theory, the Agent could also have access to the page data; this would provide additional opportunities for the agent to re-examine the original document text; however, it would also significantly increase the total tokens used.

Here is a high-level visualization of our Agentic approach to responding to user requests:

Overview of the agentic research & response process — Image created by author and team providing an overview of the agentic research & response process

Results from the pyramid: Real-world examples

To evaluate the effectiveness of our approach, we tested it against a variety of question categories, including typical fact-finding questions and complex cross-document research and analysis tasks.

Fact-finding (spear fishing):

These tasks require identifying specific information or facts that are buried in a document. These are the types of questions typical RAG solutions target but often require many searches and consume lots of tokens to answer correctly.

Example task: “What was IBM’s total revenue in the latest financial reporting?”

Example response using pyramid approach: “IBM’s total revenue for the third quarter of 2024 was $14.968 billion [ibm-10q-q3-2024.pdf, pg. 4]

Screenshot of total tokens used to research and generate response — Total tokens used to research and generate response

This result is correct (human-validated) and was generated using only 9,994 total tokens, with 1,240 tokens in the generated final response.

Complex research and analysis:

These tasks involve researching and understanding multiple concepts to gain a broader understanding of the documents and make inferences and informed assumptions based on the gathered facts.

Example task: “Analyze the investments Microsoft and NVIDIA are making in AI and how they are positioning themselves in the market. The report should be clearly formatted.”

Example response:

Screenshot of the response generated by the agent analyzing AI investments and positioning for Microsoft and NVIDIA. — Response generated by the agent analyzing AI investments and positioning for Microsoft and NVIDIA.

The result is a comprehensive report that executed quickly and contains detailed information about each of the companies. 26,802 total tokens were used to research and respond to the request with a significant percentage of them used for the final response (2,893 tokens or ~11%). These results were also reviewed by a human to verify their validity.

Screenshot of snippet indicating total token usage for the task — Snippet indicating total token usage for the task

Example task: “Create a report on analyzing the risks disclosed by the various financial companies in the DOW. Indicate which risks are shared and unique.”

Example response:

Screenshot of part 1 of a response generated by the agent on disclosed risks. — Part 1 of response generated by the agent on disclosed risks.

Screenshot of part 2 of a response generated by the agent on disclosed risks. — Part 2 of response generated by the agent on disclosed risks.

Similarly, this task was completed in 42.7 seconds and used 31,685 total tokens, with 3,116 tokens used to generate the final report.

Screenshot of a snippet indicating total token usage for the task — Snippet indicating total token usage for the task

These results for both fact-finding and complex analysis tasks demonstrate that the pyramid approach efficiently creates detailed reports with low latency using a minimal amount of tokens. The tokens used for the tasks carry dense meaning with little noise allowing for high-quality, thorough responses across tasks.

Benefits of the pyramid: Why use it?

Overall, we found that our pyramid approach provided a significant boost in response quality and overall performance for high-value questions.

Some of the key benefits we observed include:

Reduced model’s cognitive load: When the agent receives the user task, it retrieves pre-processed, distilled information rather than the raw, inconsistently formatted, disparate document chunks. This fundamentally improves the retrieval process since the model doesn’t waste its cognitive capacity on trying to break down the page/chunk text for the first time.
Superior table processing: By breaking down table information and storing it in concise but descriptive sentences, the pyramid approach makes it easier to retrieve relevant information at inference time through natural language queries. This was particularly important for our dataset since financial reports contain lots of critical information in tables.
Improved response quality to many types of requests: The pyramid enables more comprehensive context-aware responses to both precise, fact-finding questions and broad analysis based tasks that involve many themes across numerous documents.
Preservation of critical context: Since the distillation process identifies and keeps track of key facts, important information that might appear only once in the document is easier to maintain. For example, noting that all tables are represented in millions of dollars or in a particular currency. Traditional chunking methods often cause this type of information to slip through the cracks.
Optimized token usage, memory, and speed: By distilling information at ingestion time, we significantly reduce the number of tokens required during inference, are able to maximize the value of information put in the context window, and improve memory use.
Scalability: Many solutions struggle to perform as the size of the document dataset grows. This approach provides a much more efficient way to manage a large volume of text by only preserving critical information. This also allows for a more efficient use of the LLMs context window by only sending it useful, clear information.
Efficient concept exploration: The pyramid enables the agent to explore related information similar to navigating a knowledge graph, but does not require ever generating or maintaining relationships in the graph. The agent can use natural language exclusively and keep track of important facts related to the concepts it’s exploring in a highly token-efficient and fluid way.
Emergent dataset understanding: An unexpected benefit of this approach emerged during our testing. When asking questions like “what can you tell me about this dataset?” or “what types of questions can I ask?”, the system is able to respond and suggest productive search topics because it has a more robust understanding of the dataset context by accessing higher levels in the pyramid like the abstracts and recollections.

Beyond the pyramid: Evaluation challenges & future directions

Challenges

While the results we’ve observed when using the pyramid search approach have been nothing short of amazing, finding ways to establish meaningful metrics to evaluate the entire system both at ingestion time and during information retrieval is challenging. Traditional RAG and Agent evaluation frameworks often fail to address nuanced questions and analytical responses where many different responses are valid.

Our team plans to write a research paper on this approach in the future, and we are open to any thoughts and feedback from the community, especially when it comes to evaluation metrics. Many of the existing datasets we found were focused on evaluating RAG use cases within one document or precise information retrieval across multiple documents rather than robust concept and theme analysis across documents and domains.

The main use cases we are interested in relate to broader questions that are representative of how businesses actually want to interact with GenAI systems. For example, “tell me everything I need to know about customer X” or “how do the behaviors of Customer A and B differ? Which am I more likely to have a successful meeting with?”. These types of questions require a deep understanding of information across many sources. The answers to these questions typically require a person to synthesize data from multiple areas of the business and think critically about it. As a result, the answers to these questions are rarely written or saved anywhere which makes it impossible to simply store and retrieve them through a vector index in a typical RAG process.

Another consideration is that many real-world use cases involve dynamic datasets where documents are consistently being added, edited, and deleted. This makes it difficult to evaluate and track what a “correct” response is since the answer will evolve as the available information changes.

Future directions

In the future, we believe that the pyramid approach can address some of these challenges by enabling more effective processing of dense documents and storing learned information as recollections. However, tracking and evaluating the validity of the recollections over time will be critical to the system’s overall success and remains a key focus area for our ongoing work.

When applying this approach to organizational data, the pyramid process could also be used to identify and assess discrepancies across areas of the business. For example, uploading all of a company’s sales pitch decks could surface where certain products or services are being positioned inconsistently. It could also be used to compare insights extracted from various line of business data to help understand if and where teams have developed conflicting understandings of topics or different priorities. This application goes beyond pure information retrieval use cases and would allow the pyramid to serve as an organizational alignment tool that helps identify divergences in messaging, terminology, and overall communication.

Conclusion: Key takeaways and why the pyramid approach matters

The knowledge distillation pyramid approach is significant because it leverages the full power of the LLM at both ingestion and retrieval time. Our approach allows you to store dense information in fewer tokens which has the added benefit of reducing noise in the dataset at inference. Our approach also runs very quickly and is incredibly token efficient, we are able to generate responses within seconds, explore potentially hundreds of searches, and on average use (this includes all the search iterations!).

We find that the LLM is much better at writing atomic insights as sentences and that these insights effectively distill information from both text-based and tabular data. This distilled information written in natural language is very easy for the LLM to understand and navigate at inference since it does not have to expend unnecessary energy reasoning about and breaking down document formatting or filtering through noise.

The ability to retrieve and aggregate information at any level of the pyramid also provides significant flexibility to address a variety of query types. This approach offers promising performance for large datasets and enables high-value use cases that require nuanced information retrieval and analysis.

Note: The opinions expressed in this article are solely my own and do not necessarily reflect the views or policies of my employer.

Interested in discussing further or collaborating? Reach out on LinkedIn!

Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy, bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Oncor to propose rate increase, says 5-year capital expenditures could ‘soar’ by $12B

Dive Brief: Oncor Electric said Thursday that it will ask Texas regulators to increase its base rates by approximately $834 million to support Texas investments that include the utility’s $36 billion five-year capital spending plan — though that spending figure is likely to climb, officials said. In February, Oncor announced a

Chronosphere unveils logging package with cost control features

According to a study by Chronosphere, enterprise log data is growing at 250% year-over-year, and Chronosphere Logs helps engineers and observability teams to resolve incidents faster while controlling costs. The usage and volume analysis and proactive recommendations can help reduce data before it’s stored, the company says. “Organizations are drowning

Cisco CIO on the future of IT: AI, simplicity, and employee power

AI can democratize access to information to deliver a “white-glove experience” once reserved for senior executives, Previn said. That might include, for example, real-time information retrieval and intelligent process execution for every employee. “Usually, in a large company, you’ve got senior executives, and you’ve got early career hires, and it’s

AMI MegaRAC authentication bypass flaw is being exploitated, CISA warns

The spoofing attack works by manipulating HTTP request headers sent to the Redfish interface. Attackers can add specific values to headers like “X-Server-Addr” to make their external requests appear as if they’re coming from inside the server itself. Since the system automatically trusts internal requests as authenticated, this spoofing technique

Israeli Gas Flows to Egypt Return to Normal as Iran Truce Holds

Israeli natural gas flows to Egypt returned to normal levels after a truce with Iran allowed the Jewish state to reopen facilities shuttered by the 12-day conflict. Daily exports have climbed to 1 billion cubic feet per day, according to two people with direct knowledge of the situation. That’s up from 260 million cubic feet when Israel’s Leviathan gas field, the country’s biggest, restarted on Wednesday, they said, declining to be identified because they’re not authorized to speak to the media. The increased flows have let Egyptian authorities resume supplies to some factories that had been halted because of the shortages. Israel temporarily closed two of its three gas fields – Chevron-operated Leviathan and Energean’s Karish – shortly after launching attacks on Iran on June 13. The facilities that provided the bulk of exports to Egypt and Jordan resumed operations last week after a US-brokered ceasefire with the Islamic Republic took hold. The ramped-up supplies are a relief for Cairo, which has swung from a net exporter to importer of natural gas in recent years. As Israel and Iran traded blows, Egypt enacted contingency plans that included seeking alternative fuel purchases, limiting gas to some industries and switching power stations to fuel oil and diesel to maintain electricity output. What do you think? We’d love to hear from you, join the conversation on the Rigzone Energy Network. The Rigzone Energy Network is a new social experience created for you and all energy professionals to Speak Up about our industry, share knowledge, connect with peers and industry insiders and engage in a professional community that will empower your career in energy.

California Regulator Wants to Pause Newsom Refinery Profit Cap

California’s energy market regulator is backing off a plan to place a profit cap on oil refiners in the state. Siva Gunda, vice chair of the California Energy Commission, said during a Friday briefing that the cap would “serve as a deterrent” to refiners boosting investments in the state. Gunda said the commission wants to increase gasoline supply in California after two refineries announced plans to close in the next year, accounting for about one-fifth of the state’s crude-processing capacity. The recommendation marks a reversal from years of regulatory scrutiny by Governor Gavin Newsom and the California Energy Commission that contributed to plans by Phillips 66 and Valero Energy Corp. to shut their refineries. The closings prompted Newsom to adjust course in April and urge the energy regulator to collaborate with fuel makers to ensure affordable and reliable supply. Gunda wrote in a Friday letter to Newsom that the commission should pause implementation of a profit margin cap and focus on fuel resupply strategies instead. It comes more than two years after Newsom and state lawmakers gave the energy commission authority to determine a profit margin on refiners and impose financial penalties for violations. The state will be looking to increase fuel imports to make up for the loss of refining capacity, Gunda said. In the short term, California gas prices could rise 15 to 30 cents a gallon because of the loss of production, he said. A spokesperson for the energy commission said the estimated price increases would be mitigated by the plan presented on Friday. Californians already pay the highest gasoline prices in the country. Wade Crowfoot, secretary of the California Natural Resources Agency, said residents want the state to transition away from oil and gas yet they need to prevent cost spikes. “We get it,” he said.

State utility regulators urge FERC to slash ROE transmission incentive

Utility regulators from about 35 states are urging the Federal Energy Regulatory Commission to sharply limit a 0.5% return on equity incentive the agency gives to utilities that join regional transmission organizations. “The time has come for the Commission to eliminate its policy of granting the RTO Participation Adder in perpetuity, if not to eliminate this incentive altogether,” the Organization of PJM States, the Organization of MISO States, the New England States Committee on Electricity and the Southwest Power Pool Regional State Committee said in a Friday letter to FERC. The state regulators and others contends the RTO incentive adds millions to ratepayer costs to encourage behavior — being an RTO member — that they would likely do anyway. In 2021, FERC proposed limiting its ROE adder to three years. FERC Chairman Mark Christie supports the proposal as well as limiting other incentives aimed at encouraging utilities to build transmission lines. However, it appears he has been unable to convince a majority of FERC commissioners to reduce those incentives. Christie’s term ends today, although he plans to stay at the agency until at least FERC’s next open meeting on July 24. Limiting the ROE incentive could reduce utility income. Public Service Enterprise Group, for example, estimates that removing the incentive would cut annual net income and cash inflows by about $40 million for its Public Service Electric & Gas subsidiary, according to a Feb. 25 filing at the U.S. Securities and Exchange Commission. The utility earned about $1.5 billion in 2024. Ending the incentive would reduce American Electric Power’s pretax income by $35 million to $50 million a year, the utility company said in its 2023 annual report with the SEC. In April, WIRES, a transmission-focused trade group, the Edison Electric Institute, which represents investor-owned utilities, and GridWise Alliance, a

Affordability a ‘formidable challenge’ as load shifts to tech, industrial customers: ICF

Dive Brief: Keeping electricity affordable for consumers is a “formidable challenge” amid projections of declining generation capacity reserves and persistent uncertainty around the scale and pace of future load growth, ICF International Vice President of Energy Markets Maria Scheller said Thursday. Meanwhile, broad policy uncertainty and an increasingly shaky regulatory environment give utilities and capital markets pause about expensive new infrastructure investments that could become stranded assets, Scheller said in a webinar on ICF’s “Powering the Future: Addressing Surging U.S. Electricity Demand” report. Policy conversations around import tariffs, federal energy tax credits and permitting reform are unfolding as the balance of electricity demand shifts from residential and business consumers to technology and industrial customers, which tend to require around-the-clock power, Scheller added. Dive Insight: The coming shift in U.S. electricity consumption represents less of a new paradigm than a return to the industrial-driven demand the country saw from the 1950s into the 1980s, after which deindustrialization and consumer-centric trends like the widespread adoption of air conditioning, electric resistance heating and personal computing shifted the balance toward the residential segment, Scheller said. The shift is important because unlike residential loads, which show considerable seasonal and intraday variation, industrial loads are flatter, less weather-dependent and more sensitive to voltage fluctuations, Scheller said. By 2035, ICF expects nearly 40% of total U.S. load will have a “flat, power-quality-sensitive profile,” and that overall load will grow faster than peak load, she said. In 2030, ICF projects more than 3% annual power consumption growth, compared with less than 2% annual peak load growth, according to a webinar slide. That’s not to say residential demand won’t also grow in the next few years as consumers electrify home heating and buy more electric vehicles — only that data centers and other industrial demand will “dwarf” it, Scheller

Trump attacks on NRC independence pose health, safety risks

Edwin Lyman is director of nuclear power safety at the Union of Concerned Scientists. A White House executive order issued last month targeting the independence of the Nuclear Regulatory Commission, the federal agency that oversees the safety and security of U.S. commercial nuclear facilities and materials, as well as the possibly illegal firing earlier this month of Commissioner Christopher Hanson by President Donald Trump, are raising serious concerns about the agency’s effectiveness as a regulator going forward. While I’ve often been a critic of the NRC for taking actions favoring the nuclear industry at the expense of public health and safety, preserving the NRC in its current form is the best hope for heading off a U.S. nuclear plant disaster like the 2011 Fukushima Daiichi reactor meltdowns in Japan. My long-standing beef with the NRC has primarily been with its political leadership, not with the rank-and-file staff of highly knowledgeable inspectors, analysts and researchers committed to helping ensure that nuclear power remains safe and secure. These professionals are well aware how quickly things can go south at a nuclear power plant without rigorous oversight. They know from experience what obscure corners to look in and what questions to ask. And they can tell — and are not afraid to push back — when they are getting sold snake oil by fly-by-night startups looking to make easy money by capitalizing on the current nuclear power craze. Technical rigor and expert judgment form the bedrock of this work. But when staff are compelled to sweep legitimate safety concerns under the rug in the interest of political expediency, many will leave rather than compromise their scientific integrity. So there is little wonder that a wave of experienced personnel is headed out the door in the wake of the executive order on NRC “reform,”

North America Loses Rigs Week on Week

North America dropped six rigs week on week, according to Baker Hughes’ latest North America rotary rig count, which was released on June 27. Although the U.S. dropped seven rigs week on week, Canada added one rig during the same timeframe, taking the total North America rig count down to 687, comprising 547 rigs from the U.S. and 140 rigs from Canada, the count outlined. Of the total U.S. rig count of 547, 533 rigs are categorized as land rigs, 12 are categorized as offshore rigs, and two are categorized as inland water rigs. The total U.S. rig count is made up of 432 oil rigs, 109 gas rigs, and six miscellaneous rigs, according to Baker Hughes’ count, which revealed that the U.S. total comprises 496 horizontal rigs, 38 directional rigs, and 13 vertical rigs. Week on week, the U.S. land rig count reduced by five, its offshore rig count decreased by two, and its inland water rig count remained unchanged, the count highlighted. The country’s oil rig count dropped by six, its gas rig count dropped by two, and its miscellaneous rig count increased by one, week on week, the count showed. The U.S. horizontal rig count dropped by six, its directional rig count dropped by two, and its vertical rig count increased by one, week on week, the count revealed. A major state variances subcategory included in the rig count showed that, week on week, Wyoming dropped five rigs, and Oklahoma, Louisiana, and Colorado each dropped one rig. A major basin variances subcategory included in Baker Hughes’ rig count showed that, week on week, the Granite Wash basin dropped one rig and the Permian basin dropped one rig. Canada’s total rig count of 140 is made up of 94 oil rigs and 46 gas rigs, Baker Hughes pointed

Datacenter industry calls for investment after EU issues water consumption warning

CISPE’s response to the European Commission’s report warns that the resulting regulatory uncertainty could hurt the region’s economy. “Imposing new, standalone water regulations could increase costs, create regulatory fragmentation, and deter investment. This risks shifting infrastructure outside the EU, undermining both sustainability and sovereignty goals,” CISPE said in its latest policy recommendation, Advancing water resilience through digital innovation and responsible stewardship. “Such regulatory uncertainty could also reduce Europe’s attractiveness for climate-neutral infrastructure investment at a time when other regions offer clear and stable frameworks for green data growth,” it added. CISPE’s recommendations are a mix of regulatory harmonization, increased investment, and technological improvement. Currently, water reuse regulation is directed towards agriculture. Updated regulation across the bloc would encourage more efficient use of water in industrial settings such as datacenters, the asosciation said. At the same time, countries struggling with limited public sector budgets are not investing enough in water infrastructure. This could only be addressed by tapping new investment by encouraging formal public-private partnerships (PPPs), it suggested: “Such a framework would enable the development of sustainable financing models that harness private sector innovation and capital, while ensuring robust public oversight and accountability.” Nevertheless, better water management would also require real-time data gathered through networks of IoT sensors coupled to AI analytics and prediction systems. To that end, cloud datacenters were less a drain on water resources than part of the answer: “A cloud-based approach would allow water utilities and industrial users to centralize data collection, automate operational processes, and leverage machine learning algorithms for improved decision-making,” argued CISPE.

HPE-Juniper deal clears DOJ hurdle, but settlement requires divestitures

In HPE’s press release following the court’s decision, the vendor wrote that “After close, HPE will facilitate limited access to Juniper’s advanced Mist AIOps technology.” In addition, the DOJ stated that the settlement requires HPE to divest its Instant On business and mandates that the merged firm license critical Juniper software to independent competitors. Specifically, HPE must divest its global Instant On campus and branch WLAN business, including all assets, intellectual property, R&D personnel, and customer relationships, to a DOJ-approved buyer within 180 days. Instant On is aimed primarily at the SMB arena and offers a cloud-based package of wired and wireless networking gear that’s designed for so-called out-of-the-box installation and minimal IT involvement, according to HPE. HPE and Juniper focused on the positive in reacting to the settlement. “Our agreement with the DOJ paves the way to close HPE’s acquisition of Juniper Networks and preserves the intended benefits of this deal for our customers and shareholders, while creating greater competition in the global networking market,” HPE CEO Antonio Neri said in a statement. “For the first time, customers will now have a modern network architecture alternative that can best support the demands of AI workloads. The combination of HPE Aruba Networking and Juniper Networks will provide customers with a comprehensive portfolio of secure, AI-native networking solutions, and accelerate HPE’s ability to grow in the AI data center, service provider and cloud segments.” “This marks an exciting step forward in delivering on a critical customer need – a complete portfolio of modern, secure networking solutions to connect their organizations and provide essential foundations for hybrid cloud and AI,” said Juniper Networks CEO Rami Rahim. “We look forward to closing this transaction and turning our shared vision into reality for enterprise, service provider and cloud customers.”

Data center costs surge up to 18% as enterprises face two-year capacity drought

“AI workloads, especially training and archival, can absorb 10-20ms latency variance if offset by 30-40% cost savings and assured uptime,” said Gogia. “Des Moines and Richmond offer better interconnection diversity today than some saturated Tier-1 hubs.” Contract flexibility is also crucial. Rather than traditional long-term leases, enterprises are negotiating shorter agreements with renewal options and exploring revenue-sharing arrangements tied to business performance. Maximizing what you have With expansion becoming more costly, enterprises are getting serious about efficiency through aggressive server consolidation, sophisticated virtualization and AI-driven optimization tools that squeeze more performance from existing space. The companies performing best in this constrained market are focusing on optimization rather than expansion. Some embrace hybrid strategies blending existing on-premises infrastructure with strategic cloud partnerships, reducing dependence on traditional colocation while maintaining control over critical workloads. The long wait When might relief arrive? CBRE’s analysis shows primary markets had a record 6,350 MW under construction at year-end 2024, more than double 2023 levels. However, power capacity constraints are forcing aggressive pre-leasing and extending construction timelines to 2027 and beyond. The implications for enterprises are stark: with construction timelines extending years due to power constraints, companies are essentially locked into current infrastructure for at least the next few years. Those adapting their strategies now will be better positioned when capacity eventually returns.

Cisco backs quantum networking startup Qunnect

In partnership with Deutsche Telekom’s T-Labs, Qunnect has set up quantum networking testbeds in New York City and Berlin. “Qunnect understands that quantum networking has to work in the real world, not just in pristine lab conditions,” Vijoy Pandey, general manager and senior vice president of Outshift by Cisco, stated in a blog about the investment. “Their room-temperature approach aligns with our quantum data center vision.” Cisco recently announced it is developing a quantum entanglement chip that could ultimately become part of the gear that will populate future quantum data centers. The chip operates at room temperature, uses minimal power, and functions using existing telecom frequencies, according to Pandey.

HPE announces GreenLake Intelligence, goes all-in with agentic AI

Like a teammate who never sleeps Agentic AI is coming to Aruba Central as well, with an autonomous supervisory module talking to multiple specialized models to, for example, determine the root cause of an issue and provide recommendations. David Hughes, SVP and chief product officer, HPE Aruba Networking, said, “It’s like having a teammate who can work while you’re asleep, work on problems, and when you arrive in the morning, have those proposed answers there, complete with chain of thought logic explaining how they got to their conclusions.” Several new services for FinOps and sustainability in GreenLake Cloud are also being integrated into GreenLake Intelligence, including a new workload and capacity optimizer, extended consumption analytics to help organizations control costs, and predictive sustainability forecasting and a managed service mode in the HPE Sustainability Insight Center. In addition, updates to the OpsRamp operations copilot, launched in 2024, will enable agentic automation including conversational product help, an agentic command center that enables AI/ML-based alerts, incident management, and root cause analysis across the infrastructure when it is released in the fourth quarter of 2025. It is now a validated observability solution for the Nvidia Enterprise AI Factory. OpsRamp will also be part of the new HPE CloudOps software suite, available in the fourth quarter, which will include HPE Morpheus Enterprise and HPE Zerto. HPE said the new suite will provide automation, orchestration, governance, data mobility, data protection, and cyber resilience for multivendor, multi cloud, multi-workload infrastructures. Matt Kimball, principal analyst for datacenter, compute, and storage at Moor Insights & strategy, sees HPE’s latest announcements aligning nicely with enterprise IT modernization efforts, using AI to optimize performance. “GreenLake Intelligence is really where all of this comes together. I am a huge fan of Morpheus in delivering an agnostic orchestration plane, regardless of operating stack

MEF goes beyond metro Ethernet, rebrands as Mplify with expanded scope on NaaS and AI

While MEF is only now rebranding, Vachon said that the scope of the organization had already changed by 2005. Instead of just looking at metro Ethernet, the organization at the time had expanded into carrier Ethernet requirements. The organization has also had a growing focus on solving the challenge of cross-provider automation, which is where the LSO framework fits in. LSO provides the foundation for an automation framework that allows providers to more efficiently deliver complex services across partner networks, essentially creating a standardized language for service integration. NaaS leadership and industry blueprint Building on the LSO automation framework, the organization has been working on efforts to help providers with network-as-a-service (NaaS) related guidance and specifications. The organization’s evolution toward NaaS reflects member-driven demands for modern service delivery models. Vachon noted that MEF member organizations were asking for help with NaaS, looking for direction on establishing common definitions and some standard work. The organization responded by developing comprehensive industry guidance. “In 2023 we launched the first blueprint, which is like an industry North Star document. It includes what we think about NaaS and the work we’re doing around it,” Vachon said. The NaaS blueprint encompasses the complete service delivery ecosystem, with APIs including last mile, cloud, data center and security services. (Read more about its vision for NaaS, including easy provisioning and integrated security across a federated network of providers)

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs). In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle