Stay Ahead, Stay ONMINE

Generative AI Is Declarative

ChatGPT launched in 2022 and kicked off the Generative Ai boom. In the two years since, academics, technologists, and armchair experts have written libraries worth of articles on the technical underpinnings of generative AI and about the potential capabilities of both current and future generative AI models. Surprisingly little has been written about how we interact with these tools—the human-AI interface. The point where we interact with AI models is at least as important as the algorithms and data that create them. “There is no success where there is no possibility of failure, no art without the resistance of the medium” (Raymond Chandler). In that vein, it’s useful to examine human-AI interaction and the strengths and weaknesses inherent in that interaction. If we understand the “resistance in the medium” then product managers can make smarter decisions about how to incorporate generative AI into their products. Executives can make smarter decisions about what capabilities to invest in. Engineers and designers can build around the tools’ limitations and showcase their strengths. Everyday people can know when to use generative AI and when not to. Imagine walking into a restaurant and ordering a cheeseburger. You don’t tell the chef how to grind the beef, how hot to set the grill, or how long to toast the bun. Instead, you simply describe what you want: “I’d like a cheeseburger, medium rare, with lettuce and tomato.” The chef interprets your request, handles the implementation, and delivers the desired outcome. This is the essence of declarative interaction—focusing on the what rather than the how. Now, imagine interacting with a Large Language Model (LLM) like ChatGPT. You don’t have to provide step-by-step instructions for how to generate a response. Instead, you describe the result you’re looking for: “A user story that lets us implement A/B testing for the Buy button on our website.” The LLM interprets your prompt, fills in the missing details, and delivers a response. Just like ordering a cheeseburger, this is a declarative mode of interaction. Explaining the steps to make a cheeseburger is an imperative interaction. Our LLM prompts sometimes feel imperative. We might phrase our prompts like a question: ”What is the tallest mountain on earth?” This is equivalent to describing “the answer to the question ‘What is the tallest mountain on earth?’” We might phrase our prompt as a series of instructions: ”Write a summary of the attached report, then read it as if you are a product manager, then type up some feedback on the report.” But, again, we’re describing the result of a process with some context for what that process is. In this case, it is a sequence of descriptive results—the report then the feedback. This is a more useful way to think about LLMs and generative AI. In some ways it is more accurate; the neural network model behind the curtain doesn’t explain why or how it produced one output instead of another. More importantly though, the limitations and strengths of generative AI make more sense and become more predictable when we think of these models as declarative. LLMs as a declarative mode of interaction Computer scientists use the term “declarative” to describe coding languages. SQL is one of the most common. The code describes the output table and the procedures in the database figure out how to retrieve and combine the data to produce the result. LLMs share many of the benefits of declarative languages like SQL or declarative interactions like ordering a cheeseburger. Focus on desired outcome: Just as you describe the cheeseburger you want, you describe the output you want from the LLM. For example, “Summarize this article in three bullet points” focuses on the result, not the process. Abstraction of implementation: When you order a cheeseburger, you don’t need to know how the chef prepares it. When submitting SQL code to a server, the server figures out where the data lives, how to fetch it, and how to aggregate it based on your description. You as the user don’t need to know how. With LLMs, you don’t need to know how the model generates the response. The underlying mechanisms are abstracted away. Filling in missing details: If you don’t specify onions on your cheeseburger, the chef won’t include them. If you don’t specify a field in your SQL code, it won’t show up in the output table. This is where LLMs differ slightly from declarative coding languages like SQL. If you ask ChatGPT to create an image of “a cheeseburger with lettuce and tomato” it may also show the burger on a sesame seed bun or include pickles, even if that wasn’t in your description. The details you omit are inferred by the LLM using the “average” or “most likely” detail depending on the context, with a bit of randomness thrown in. Ask for the cheeseburger image six times; it may show you three burgers with cheddar cheese, two with Swiss, and one with pepper jack. Like other forms of declarative interaction, LLMs share one key limitation. If your description is vague, ambiguous, or lacks enough detail, then the result may not be what you hoped to see. It is up to the user to describe the result with sufficient detail. This explains why we often iterate to get what we’re looking for when using LLMs and generative AI. Going back to our cheeseburger analogy, the process to generate a cheeseburger from an LLM may look like this. “Make me a cheeseburger, medium rare, with lettuce and tomatoes.” The result also has pickles and uses cheddar cheese. The bun is toasted. There’s mayo on the top bun. “Make the same thing but this time no pickles, use pepper jack cheese, and a sriracha mayo instead of plain mayo.” The result now has pepper jack, no pickles. The sriracha mayo is applied to the bottom bun and the bun is no longer toasted. “Make the same thing again, but this time, put the sriracha mayo on the top bun. The buns should be toasted.” Finally, you have the cheeseburger you’re looking for. This example demonstrates one of the main points of friction with human-AI interaction. Human beings are really bad at describing what they want with sufficient detail on the first attempt. When we asked for a cheeseburger, we had to refine our description to be more specific (the type of cheese). In the second generation, some of the inferred details (whether the bun was toasted) changed from one iteration to the next, so then we had to add that specificity to our description as well. Iteration is an important part of AI-human generation. Insight: When using generative AI, we need to design an iterative human-AI interaction loop that enables people to discover the details of what they want and refine their descriptions accordingly. To iterate, we need to evaluate the results. Evaluation is extremely important with generative AI. Say you’re using an LLM to write code. You can evaluate the code quality if you know enough to understand it or if you can execute it and inspect the results. On the other hand, hypothetical questions can’t be tested. Say you ask ChatGPT, “What if we raise our product prices by 5 percent?” A seasoned expert could read the output and know from experience if a recommendation doesn’t take into account important details. If your product is property insurance, then increasing premiums by 5 percent may mean pushback from regulators, something an experienced veteran of the industry would know. For non-experts in a topic, there’s no way to tell if the “average” details inferred by the model make sense for your specific use case. You can’t test and iterate. Insight: LLMs work best when the user can evaluate the result quickly, whether through execution or through prior knowledge. The examples so far involve general knowledge. We all know what a cheeseburger is. When you start asking about non-general information—like when you can make dinner reservations next week—you delve into new points of friction. In the next section we’ll think about different types of information, what we can expect the AI to “know”, and how this impacts human-AI interaction. What did the AI know, and when did it know it? Above, I explained how generative AI is a declarative mode of interaction and how that helps understand its strengths and weaknesses. Here, I’ll identify how different types of information create better or worse human-AI interactions. Understanding the information available When we describe what we want to an LLM, and when it infers missing details from our description, it draws from different sources of information. Understanding these sources of information is important. Here’s a useful taxonomy for information types: General information used to train the base model. Non-general information that the base model is not aware of. Fresh information that is new or changes rapidly, like stock prices or current events. Non-public information, like facts about you and where you live or about your company, its employees, its processes, or its codebase. General information vs. non-general information LLMs are built on a massive corpus of written word data. A large part of GPT-3 was trained on a combination of books, journals, Wikipedia, Reddit, and CommonCrawl (an open-source repository of web crawl data). You can think of the models as a highly compressed version of that data, organized in a gestalt manner—all the like things are close together. When we submit a prompt, the model takes the words we use (and any words added to the prompt behind the scenes) and finds the closest set of related words based on how those things appear in the data corpus. So when we say “cheeseburger” it knows that word is related to “bun” and “tomato” and “lettuce” and “pickles” because they all occur in the same context throughout many data sources. Even when we don’t specify pickles, it uses this gestalt approach to fill in the blanks. This training information is general information, and a good rule of thumb is this: if it was in Wikipedia a year ago then the LLM “knows” about it. There could be new articles on Wikipedia, but that didn’t exist when the model was trained. The LLM doesn’t know about that unless told. Now, say you’re a company using an LLM to write a product requirements document for a new web app feature. Your company, like most companies, is full of its own lingo. It has its own lore and history scattered across thousands of Slack messages, emails, documents, and some tenured employees who remember that one meeting in Q1 last year. The LLM doesn’t know any of that. It will infer any missing details from general information. You need to supply everything else. If it wasn’t in Wikipedia a year ago, the LLM doesn’t know about it. The resulting product requirements document may be full of general facts about your industry and product but could lack important details specific to your firm. This is non-general information. This includes personal info, anything kept behind a log-in or paywall, and non-digital information. This non-general information permeates our lives, and incorporating it is another source of friction when working with generative AI. Non-general information can be incorporated into a generative AI application in three ways: Through model fine-tuning (supplying a large corpus to the base model to expand its reference data). Retrieved and fed it to the model at query time (e.g., the retrieval augmented generation or “RAG” technique). Supplied by the user in the prompt. Insight: When designing any human-AI interactions, you should think about what non-general information is required, where you will get it, and how you will expose it to the AI. Fresh information Any information that changes in real-time or is new can be called fresh information. This includes new facts like current events but also frequently changing facts like your bank account balance. If the fresh information is available in a database or some searchable source, then it needs to be retrieved and incorporated into the application. To retrieve the information from a database, the LLM must create a query, which may require specific details that the user didn’t include. Here’s an example. I have a chatbot that gives information on the stock market. You, the user, type the following: “What is the current price of Apple? Has it been increasing or decreasing recently?” The LLM doesn’t have the current price of Apple in its training data. This is fresh, non-general information. So, we need to retrieve it from a database. The LLM can read “Apple”, know that you’re talking about the computer company, and that the ticker symbol is AAPL. This is all general information. What about the “increasing or decreasing” part of the prompt? You did not specify over what period—increasing in the past day, month, year? In order to construct a database query, we need more detail. LLMs are bad at knowing when to ask for detail and when to fill it in. The application could easily pull the wrong data and provide an unexpected or inaccurate answer. Only you know what these details should be, depending on your intent. You must be more specific in your prompt. A designer of this LLM application can improve the user experience by specifying required parameters for expected queries. We can ask the user to explicitly input the time range or design the chatbot to ask for more specific details if not provided. In either case, we need to have a specific type of query in mind and explicitly design how to handle it. The LLM will not know how to do this unassisted. Insight: If a user is expecting a more specific type of output, you need to explicitly ask for enough detail. Too little detail could produce a poor quality output. Non-public information Incorporating non-public information into an LLM prompt can be done if that information can be accessed in a database. This introduces privacy issues (should the LLM be able to access my medical records?) and complexity when incorporating multiple non-public sources of information. Let’s say I have a chatbot that helps you make dinner reservations. You, the user, type the following: “Help me make dinner reservations somewhere with good Neapolitan pizza.” The LLM knows what a Neapolitan pizza is and can infer that “dinner” means this is for an evening meal. To do this task well, it needs information about your location, the restaurants near you and their booking status, or even personal details like dietary restrictions. Assuming all that non-public information is available in databases, bringing them all together into the prompt takes a lot of engineering work. Even if the LLM could find the “best” restaurant for you and book the reservation, can you be confident it has done that correctly? You never specified how many people you need a reservation for. Since only you know this information, the application needs to ask for it upfront. If you’re designing this LLM-based application, you can make some thoughtful choices to help with these problems. We could ask about a user’s dietary restrictions when they sign up for the app. Other information, like the user’s schedule that evening, can be given in a prompting tip or by showing the default prompt option “show me reservations for two for tomorrow at 7PM”. Promoting tips may not feel as automagical as a bot that does it all, but they are a straightforward way to collect and integrate the non-public information. Some non-public information is large and can’t be quickly collected and processed when the prompt is given. These need to be fine-tuned in batch or retrieved at prompt time and incorporated. A chatbot that answers information about a company’s HR policies can obtain this information from a corpus of non-public HR documents. You can fine-tune the model ahead of time by feeding it the corpus. Or you can implement a retrieval augmented generation technique, searching a corpus for relevant documents and summarizing the results. Either way, the response will only be as accurate and up-to-date as the corpus itself. Insight: When designing an AI application, you need to be aware of non-public information and how to retrieve it. Some of that information can be pulled from databases. Some needs to come from the user, which may require prompt suggestions or explicitly asking. If you understand the types of information and treat human-AI interaction as declarative, you can more easily predict which AI applications will work and which ones won’t. In the next section we’ll look at OpenAI’s Operator and deep research products. Using this framework, we can see where these applications fall short, where they work well, and why. Critiquing OpenAI’s Operator and deep research through a declarative lens I have now explained how thinking of generative AI as declarative helps us understand its strengths and weaknesses. I also identified how different types of information create better or worse human-AI interactions. Now I’ll apply these ideas by critiquing two recent products from OpenAI—Operator and deep research. It’s important to be honest about the shortcomings of AI applications. Bigger models trained on more data or using new techniques might one day solve some issues with generative AI. But other issues arise from the human-AI interaction itself and can only be addressed by making appropriate design and product choices. These critiques demonstrate how the framework can help identify where the limitations are and how to address them. The limitations of Operator Journalist Casey Newton of Platformer reviewed Operator in an article that was largely positive. Newton has covered AI extensively and optimistically. Still, Newton couldn’t help but point out some of Operator’s frustrating limitations. [Operator] can take action on your behalf in ways that are new to AI systems — but at the moment it requires a lot of hand-holding, and may cause you to throw up your hands in frustration.  My most frustrating experience with Operator was my first one: trying to order groceries. “Help me buy groceries on Instacart,” I said, expecting it to ask me some basic questions. Where do I live? What store do I usually buy groceries from? What kinds of groceries do I want?  It didn’t ask me any of that. Instead, Operator opened Instacart in the browser tab and begin searching for milk in grocery stores located in Des Moines, Iowa. The prompt “Help me buy groceries on Instacart,” viewed declaratively, describes groceries being purchased using Instacart. It doesn’t have a lot of the information someone would need to buy groceries, like what exactly to buy, when it would be delivered, and to where. It’s worth repeating: LLMs are not good at knowing when to ask additional questions unless explicitly programmed to do so in the use case. Newton gave a vague request and expected follow-up questions. Instead, the LLM filled in all the missing details with the “average”. The average item was milk. The average location was Des Moines, Iowa. Newton doesn’t mention when it was scheduled to be delivered, but if the “average” delivery time is tomorrow, then that was likely the default. If we engineered this application specifically for ordering groceries, keeping in mind the declarative nature of AI and the information it “knows”, then we could make thoughtful design choices that improve functionality. We would need to prompt the user to specify when and where they want groceries up front (non-public information). With that information, we could find an appropriate grocery store near them. We would need access to that grocery store’s inventory (more non-public information). If we have access to the user’s previous orders, we could also pre-populate a cart with items typical to their order. If not, we may add a few suggested items and guide them to add more. By limiting the use case, we only have to deal with two sources of non-public information. This is a more tractable problem than Operator’s “agent that does it all” approach. Newton also mentions that this process took eight minutes to complete, and “complete” means that Operator did everything up to placing the order. This is a long time with very little human-in-the-loop iteration. Like we said before, an iteration loop is very important for human-AI interaction. A better-designed application would generate smaller steps along the way and provide more frequent interaction. We could prompt the user to describe what to add to their shopping list. The user might say, “Add barbeque sauce to the list,” and see the list update. If they see a vinegar-based barbecue sauce, they can refine that by saying, “Replace that with a barbeque sauce that goes well with chicken,” and might be happier when it’s replaced by a honey barbecue sauce. These frequent iterations make the LLM a creative tool rather than a does-it-all agent. The does-it-all agent looks automagical in marketing, but a more guided approach provides more utility with a less frustrating and more delightful experience. Elsewhere in the article, Newton gives an example of a prompt that Operator performed well: “Put together a lesson plan on the Great Gatsby for high school students, breaking it into readable chunks and then creating assignments and connections tied to the Common Core learning standard.” This prompt describes an output using much more specificity. It also solely relies on general information—the Great Gatsby, the Common Core standard, and a general sense of what assignments are. The general-information use case lends itself better to AI generation, and the prompt is explicit and detailed in its request. In this case, very little guidance was given to create the prompt, so it worked better. (In fact, this prompt comes from Ethan Mollick who has used it to evaluate AI chatbots.) This is the risk of general-purpose AI applications like Operator. The quality of the result relies heavily on the use case and specificity provided by the user. An application with a more specific use case allows for more design guidance and can produce better output more reliably. The limitations of deep research Newton also reviewed deep research, which, according to OpenAI’s website, is an “agent that uses reasoning to synthesize large amounts of online information and complete multi-step research tasks for you.” Deep research came out after Newton’s review of Operator. Newton chose an intentionally tricky prompt that prods at some of the tool’s limitations regarding fresh information and non-general information: “I wanted to see how OpenAI’s agent would perform given that it was researching a story that was less than a day old, and for which much of the coverage was behind paywalls that the agent would not be able to access. And indeed, the bot struggled more than I expected.” Near the end of the article, Newton elaborates on some of the shortcomings he noticed with deep research. OpenAI’s deep research suffers from the same design problem that almost all AI products have: its superpowers are completely invisible and must be harnessed through a frustrating process of trial and error. Generally speaking, the more you already know about something, the more useful I think deep research is. This may be somewhat counterintuitive; perhaps you expected that an AI agent would be well suited to getting you up to speed on an important topic that just landed on your lap at work, for example.  In my early tests, the reverse felt true. Deep research excels for drilling deep into subjects you already have some expertise in, letting you probe for specific pieces of information, types of analysis, or ideas that are new to you. The “frustrating trial and error” shows a mismatch between Newton’s expectations and a necessary aspect of many generative AI applications. A good response requires more information than the user will probably give in the first attempt. The challenge is to design the application and set the user’s expectations so that this interaction is not frustrating but exciting. Newton’s more poignant criticism is that the application requires already knowing something about the topic for it to work well. From the perspective of our framework, this makes sense. The more you know about a topic, the more detail you can provide. And as you iterate, having knowledge about a topic helps you observe and evaluate the output. Without the ability to describe it well or evaluate the results, the user is less likely to use the tool to generate good output. A version of deep research designed for lawyers to perform legal research could be powerful. Lawyers have an extensive and common vocabulary for describing legal matters, and they’re more likely to see a result and know if it makes sense. Generative AI tools are fallible, though. So, the tool should focus on a generation-evaluation loop rather than writing a final draft of a legal document. The article also highlights many improvements compared to Operator. Most notably, the bot asked clarifying questions. This is the most impressive aspect of the tool. Undoubtedly, it helps that deep search has a focused use-case of retrieving and summarizing general information instead of a does-it-all approach. Having a focused use case narrows the set of likely interactions, letting you design better guidance into the prompt flow. Good application design with generative AI Designing effective generative AI applications requires thoughtful consideration of how users interact with the technology, the types of information they need, and the limitations of the underlying models. Here are some key principles to guide the design of generative AI tools: 1. Constrain the input and focus on providing details Applications are inputs and outputs. We want the outputs to be useful and pleasant. By giving a user a conversational chatbot interface, we allow for a vast surface area of potential inputs, making it a challenge to guarantee useful outputs. One strategy is to limit or guide the input to a more manageable subset. For example, FigJam, a collaborative whiteboarding tool, uses pre-set template prompts for timelines, Gantt charts, and other common whiteboard artifacts. This provides some structure and predictability to the inputs. Users still have the freedom to describe further details like color or the content for each timeline event. This approach ensures that the AI has enough specificity to generate meaningful outputs while giving users creative control. 2. Design frequent iteration and evaluation into the tool Iterating in a tight generation-evaluation loop is essential for refining outputs and ensuring they meet user expectations. OpenAI’s Dall-E is great at this. Users quickly iterate on image prompts and refine their descriptions to add additional detail. If you type “a picture of a cheeseburger on a plate”, you may then add more detail by specifying “with pepperjack cheese”. AI code generating tools work well because users can run a generated code snippet immediately to see if it works, enabling rapid iteration and validation. This quick evaluation loop produces better results and a better coder experience.  Designers of generative AI applications should pull the user in the loop early, often, in a way that is engaging rather than frustrating. Designers should also consider the user’s knowledge level. Users with domain expertise can iterate more effectively. Referring back to the FigJam example, the prompts and icons in the app quickly communicate “this is what we call a mind map” or “this is what we call a gantt chart” for users who want to generate these artifacts but don’t know the terms for them. Giving the user some basic vocabulary can help them better generate desired results quickly with less frustration. 3. Be mindful of the types of information needed LLMs excel at tasks involving general knowledge already in the base training set. For example, writing class assignments involves absorbing general information, synthesizing it, and producing a written output, so LLMs are very well-suited for that task. Use cases that require non-general information are more complex. Some questions the designer and engineer should ask include: Does this application require fresh information? Maybe this is knowledge of current events or a user’s current bank account balance. If so, that information needs to be retrieved and incorporated into the model. How much non-general information does the LLM need to know? If it’s a lot of information—like a corpus of company documentation and communication—then the model may need to be fine tuned in batch ahead of time. If the information is relatively small, a retrieval augmented generation (RAG) approach at query time may suffice.  How many sources of non-general information—small and finite or potentially infinite? General purpose agents like Operator face the challenge of potentially infinite non-general information sources. Depending on what the user requires, it could need to access their contacts, restaurant reservation lists, financial data, or even other people’s calendars. A single-purpose restaurant reservation chatbot may only need access to Yelp, OpenTable, and the user’s calendar. It’s much easier to reconcile access and authentication for a handful of known data sources. Is there context-specific information that can only come from the user? Consider our restaurant reservation chatbot. Is the user making reservations for just themselves? Probably not. “How many people and who” is a detail that only the user can provide, an example of non-public information that only the user knows. We shouldn’t expect the user to provide this information upfront and unguided. Instead, we can use prompt suggestions so they include the information. We may even be able to design the LLM to ask these questions when the detail is not provided. 4. Focus on specific use cases Broad, all-purpose chatbots often struggle to deliver consistent results due to the complexity and variability of user needs. Instead, focus on specific use cases where the AI’s shortcomings can be mitigated through thoughtful design. Narrowing the scope helps us address many of the issues above. We can identify common requests for the use case and incorporate those into prompt suggestions. We can design an iteration loop that works well with the type of thing we’re generating. We can identify sources of non-general information and devise solutions to incorporate it into the model or prompt. 5. Translation or summary tasks work well A common task for ChatGPT is to rewrite something in a different style, explain what some computer code is doing, or summarize a long document. These tasks involve converting a set of information from one form to another. We have the same concerns about non-general information and context. For instance, a Chatbot asked to explain a code script doesn’t know the system that script is part of unless that information is provided. But in general, the task of transforming or summarizing information is less prone to missing details. By definition, you have provided the details it needs. The result should have the same information in a different or more condensed form. The exception to the rules There is a case when it doesn’t matter if you break any or all of these rules—when you’re just having fun. LLMs are creative tools by nature. They can be an easel to paint on, a sandbox to build in, a blank sheet to scribe. Iteration is still important; the user wants to see the thing they’re creating as they create it. But unexpected results due to lack of information or omitted details may add to the experience. If you ask for a cheeseburger recipe, you might get some funny or interesting ingredients. If the stakes are low and the process is its own reward, don’t worry about the rules.

ChatGPT launched in 2022 and kicked off the Generative Ai boom. In the two years since, academics, technologists, and armchair experts have written libraries worth of articles on the technical underpinnings of generative AI and about the potential capabilities of both current and future generative AI models.

Surprisingly little has been written about how we interact with these tools—the human-AI interface. The point where we interact with AI models is at least as important as the algorithms and data that create them. “There is no success where there is no possibility of failure, no art without the resistance of the medium” (Raymond Chandler). In that vein, it’s useful to examine human-AI interaction and the strengths and weaknesses inherent in that interaction. If we understand the “resistance in the medium” then product managers can make smarter decisions about how to incorporate generative AI into their products. Executives can make smarter decisions about what capabilities to invest in. Engineers and designers can build around the tools’ limitations and showcase their strengths. Everyday people can know when to use generative AI and when not to.

Imagine walking into a restaurant and ordering a cheeseburger. You don’t tell the chef how to grind the beef, how hot to set the grill, or how long to toast the bun. Instead, you simply describe what you want: “I’d like a cheeseburger, medium rare, with lettuce and tomato.” The chef interprets your request, handles the implementation, and delivers the desired outcome. This is the essence of declarative interaction—focusing on the what rather than the how.

Now, imagine interacting with a Large Language Model (LLM) like ChatGPT. You don’t have to provide step-by-step instructions for how to generate a response. Instead, you describe the result you’re looking for: “A user story that lets us implement A/B testing for the Buy button on our website.” The LLM interprets your prompt, fills in the missing details, and delivers a response. Just like ordering a cheeseburger, this is a declarative mode of interaction.

Explaining the steps to make a cheeseburger is an imperative interaction. Our LLM prompts sometimes feel imperative. We might phrase our prompts like a question: ”What is the tallest mountain on earth?” This is equivalent to describing “the answer to the question ‘What is the tallest mountain on earth?’” We might phrase our prompt as a series of instructions: ”Write a summary of the attached report, then read it as if you are a product manager, then type up some feedback on the report.” But, again, we’re describing the result of a process with some context for what that process is. In this case, it is a sequence of descriptive results—the report then the feedback.

This is a more useful way to think about LLMs and generative AI. In some ways it is more accurate; the neural network model behind the curtain doesn’t explain why or how it produced one output instead of another. More importantly though, the limitations and strengths of generative AI make more sense and become more predictable when we think of these models as declarative.

LLMs as a declarative mode of interaction

Computer scientists use the term “declarative” to describe coding languages. SQL is one of the most common. The code describes the output table and the procedures in the database figure out how to retrieve and combine the data to produce the result. LLMs share many of the benefits of declarative languages like SQL or declarative interactions like ordering a cheeseburger.

  1. Focus on desired outcome: Just as you describe the cheeseburger you want, you describe the output you want from the LLM. For example, “Summarize this article in three bullet points” focuses on the result, not the process.
  2. Abstraction of implementation: When you order a cheeseburger, you don’t need to know how the chef prepares it. When submitting SQL code to a server, the server figures out where the data lives, how to fetch it, and how to aggregate it based on your description. You as the user don’t need to know how. With LLMs, you don’t need to know how the model generates the response. The underlying mechanisms are abstracted away.
  3. Filling in missing details: If you don’t specify onions on your cheeseburger, the chef won’t include them. If you don’t specify a field in your SQL code, it won’t show up in the output table. This is where LLMs differ slightly from declarative coding languages like SQL. If you ask ChatGPT to create an image of “a cheeseburger with lettuce and tomato” it may also show the burger on a sesame seed bun or include pickles, even if that wasn’t in your description. The details you omit are inferred by the LLM using the “average” or “most likely” detail depending on the context, with a bit of randomness thrown in. Ask for the cheeseburger image six times; it may show you three burgers with cheddar cheese, two with Swiss, and one with pepper jack.

Like other forms of declarative interaction, LLMs share one key limitation. If your description is vague, ambiguous, or lacks enough detail, then the result may not be what you hoped to see. It is up to the user to describe the result with sufficient detail.

This explains why we often iterate to get what we’re looking for when using LLMs and generative AI. Going back to our cheeseburger analogy, the process to generate a cheeseburger from an LLM may look like this.

  • “Make me a cheeseburger, medium rare, with lettuce and tomatoes.” The result also has pickles and uses cheddar cheese. The bun is toasted. There’s mayo on the top bun.
  • “Make the same thing but this time no pickles, use pepper jack cheese, and a sriracha mayo instead of plain mayo.” The result now has pepper jack, no pickles. The sriracha mayo is applied to the bottom bun and the bun is no longer toasted.
  • “Make the same thing again, but this time, put the sriracha mayo on the top bun. The buns should be toasted.” Finally, you have the cheeseburger you’re looking for.

This example demonstrates one of the main points of friction with human-AI interaction. Human beings are really bad at describing what they want with sufficient detail on the first attempt.

When we asked for a cheeseburger, we had to refine our description to be more specific (the type of cheese). In the second generation, some of the inferred details (whether the bun was toasted) changed from one iteration to the next, so then we had to add that specificity to our description as well. Iteration is an important part of AI-human generation.

Insight: When using generative AI, we need to design an iterative human-AI interaction loop that enables people to discover the details of what they want and refine their descriptions accordingly.

To iterate, we need to evaluate the results. Evaluation is extremely important with generative AI. Say you’re using an LLM to write code. You can evaluate the code quality if you know enough to understand it or if you can execute it and inspect the results. On the other hand, hypothetical questions can’t be tested. Say you ask ChatGPT, “What if we raise our product prices by 5 percent?” A seasoned expert could read the output and know from experience if a recommendation doesn’t take into account important details. If your product is property insurance, then increasing premiums by 5 percent may mean pushback from regulators, something an experienced veteran of the industry would know. For non-experts in a topic, there’s no way to tell if the “average” details inferred by the model make sense for your specific use case. You can’t test and iterate.

Insight: LLMs work best when the user can evaluate the result quickly, whether through execution or through prior knowledge.

The examples so far involve general knowledge. We all know what a cheeseburger is. When you start asking about non-general information—like when you can make dinner reservations next week—you delve into new points of friction.

In the next section we’ll think about different types of information, what we can expect the AI to “know”, and how this impacts human-AI interaction.

What did the AI know, and when did it know it?

Above, I explained how generative AI is a declarative mode of interaction and how that helps understand its strengths and weaknesses. Here, I’ll identify how different types of information create better or worse human-AI interactions.

Understanding the information available

When we describe what we want to an LLM, and when it infers missing details from our description, it draws from different sources of information. Understanding these sources of information is important. Here’s a useful taxonomy for information types:

  • General information used to train the base model.
  • Non-general information that the base model is not aware of.
    • Fresh information that is new or changes rapidly, like stock prices or current events.
    • Non-public information, like facts about you and where you live or about your company, its employees, its processes, or its codebase.

General information vs. non-general information

LLMs are built on a massive corpus of written word data. A large part of GPT-3 was trained on a combination of books, journals, Wikipedia, Reddit, and CommonCrawl (an open-source repository of web crawl data). You can think of the models as a highly compressed version of that data, organized in a gestalt manner—all the like things are close together. When we submit a prompt, the model takes the words we use (and any words added to the prompt behind the scenes) and finds the closest set of related words based on how those things appear in the data corpus. So when we say “cheeseburger” it knows that word is related to “bun” and “tomato” and “lettuce” and “pickles” because they all occur in the same context throughout many data sources. Even when we don’t specify pickles, it uses this gestalt approach to fill in the blanks.

This training information is general information, and a good rule of thumb is this: if it was in Wikipedia a year ago then the LLM “knows” about it. There could be new articles on Wikipedia, but that didn’t exist when the model was trained. The LLM doesn’t know about that unless told.

Now, say you’re a company using an LLM to write a product requirements document for a new web app feature. Your company, like most companies, is full of its own lingo. It has its own lore and history scattered across thousands of Slack messages, emails, documents, and some tenured employees who remember that one meeting in Q1 last year. The LLM doesn’t know any of that. It will infer any missing details from general information. You need to supply everything else. If it wasn’t in Wikipedia a year ago, the LLM doesn’t know about it. The resulting product requirements document may be full of general facts about your industry and product but could lack important details specific to your firm.

This is non-general information. This includes personal info, anything kept behind a log-in or paywall, and non-digital information. This non-general information permeates our lives, and incorporating it is another source of friction when working with generative AI.

Non-general information can be incorporated into a generative AI application in three ways:

  • Through model fine-tuning (supplying a large corpus to the base model to expand its reference data).
  • Retrieved and fed it to the model at query time (e.g., the retrieval augmented generation or “RAG” technique).
  • Supplied by the user in the prompt.

Insight: When designing any human-AI interactions, you should think about what non-general information is required, where you will get it, and how you will expose it to the AI.

Fresh information

Any information that changes in real-time or is new can be called fresh information. This includes new facts like current events but also frequently changing facts like your bank account balance. If the fresh information is available in a database or some searchable source, then it needs to be retrieved and incorporated into the application. To retrieve the information from a database, the LLM must create a query, which may require specific details that the user didn’t include.

Here’s an example. I have a chatbot that gives information on the stock market. You, the user, type the following: “What is the current price of Apple? Has it been increasing or decreasing recently?”

  • The LLM doesn’t have the current price of Apple in its training data. This is fresh, non-general information. So, we need to retrieve it from a database.
  • The LLM can read “Apple”, know that you’re talking about the computer company, and that the ticker symbol is AAPL. This is all general information.
  • What about the “increasing or decreasing” part of the prompt? You did not specify over what period—increasing in the past day, month, year? In order to construct a database query, we need more detail. LLMs are bad at knowing when to ask for detail and when to fill it in. The application could easily pull the wrong data and provide an unexpected or inaccurate answer. Only you know what these details should be, depending on your intent. You must be more specific in your prompt.

A designer of this LLM application can improve the user experience by specifying required parameters for expected queries. We can ask the user to explicitly input the time range or design the chatbot to ask for more specific details if not provided. In either case, we need to have a specific type of query in mind and explicitly design how to handle it. The LLM will not know how to do this unassisted.

Insight: If a user is expecting a more specific type of output, you need to explicitly ask for enough detail. Too little detail could produce a poor quality output.

Non-public information

Incorporating non-public information into an LLM prompt can be done if that information can be accessed in a database. This introduces privacy issues (should the LLM be able to access my medical records?) and complexity when incorporating multiple non-public sources of information.

Let’s say I have a chatbot that helps you make dinner reservations. You, the user, type the following: “Help me make dinner reservations somewhere with good Neapolitan pizza.”

  • The LLM knows what a Neapolitan pizza is and can infer that “dinner” means this is for an evening meal.
  • To do this task well, it needs information about your location, the restaurants near you and their booking status, or even personal details like dietary restrictions. Assuming all that non-public information is available in databases, bringing them all together into the prompt takes a lot of engineering work.
  • Even if the LLM could find the “best” restaurant for you and book the reservation, can you be confident it has done that correctly? You never specified how many people you need a reservation for. Since only you know this information, the application needs to ask for it upfront.

If you’re designing this LLM-based application, you can make some thoughtful choices to help with these problems. We could ask about a user’s dietary restrictions when they sign up for the app. Other information, like the user’s schedule that evening, can be given in a prompting tip or by showing the default prompt option “show me reservations for two for tomorrow at 7PM”. Promoting tips may not feel as automagical as a bot that does it all, but they are a straightforward way to collect and integrate the non-public information.

Some non-public information is large and can’t be quickly collected and processed when the prompt is given. These need to be fine-tuned in batch or retrieved at prompt time and incorporated. A chatbot that answers information about a company’s HR policies can obtain this information from a corpus of non-public HR documents. You can fine-tune the model ahead of time by feeding it the corpus. Or you can implement a retrieval augmented generation technique, searching a corpus for relevant documents and summarizing the results. Either way, the response will only be as accurate and up-to-date as the corpus itself.

Insight: When designing an AI application, you need to be aware of non-public information and how to retrieve it. Some of that information can be pulled from databases. Some needs to come from the user, which may require prompt suggestions or explicitly asking.

If you understand the types of information and treat human-AI interaction as declarative, you can more easily predict which AI applications will work and which ones won’t. In the next section we’ll look at OpenAI’s Operator and deep research products. Using this framework, we can see where these applications fall short, where they work well, and why.

Critiquing OpenAI’s Operator and deep research through a declarative lens

I have now explained how thinking of generative AI as declarative helps us understand its strengths and weaknesses. I also identified how different types of information create better or worse human-AI interactions.

Now I’ll apply these ideas by critiquing two recent products from OpenAI—Operator and deep research. It’s important to be honest about the shortcomings of AI applications. Bigger models trained on more data or using new techniques might one day solve some issues with generative AI. But other issues arise from the human-AI interaction itself and can only be addressed by making appropriate design and product choices.

These critiques demonstrate how the framework can help identify where the limitations are and how to address them.

The limitations of Operator

Journalist Casey Newton of Platformer reviewed Operator in an article that was largely positive. Newton has covered AI extensively and optimistically. Still, Newton couldn’t help but point out some of Operator’s frustrating limitations.

[Operator] can take action on your behalf in ways that are new to AI systems — but at the moment it requires a lot of hand-holding, and may cause you to throw up your hands in frustration. 

My most frustrating experience with Operator was my first one: trying to order groceries. “Help me buy groceries on Instacart,” I said, expecting it to ask me some basic questions. Where do I live? What store do I usually buy groceries from? What kinds of groceries do I want? 

It didn’t ask me any of that. Instead, Operator opened Instacart in the browser tab and begin searching for milk in grocery stores located in Des Moines, Iowa.

The prompt “Help me buy groceries on Instacart,” viewed declaratively, describes groceries being purchased using Instacart. It doesn’t have a lot of the information someone would need to buy groceries, like what exactly to buy, when it would be delivered, and to where.

It’s worth repeating: LLMs are not good at knowing when to ask additional questions unless explicitly programmed to do so in the use case. Newton gave a vague request and expected follow-up questions. Instead, the LLM filled in all the missing details with the “average”. The average item was milk. The average location was Des Moines, Iowa. Newton doesn’t mention when it was scheduled to be delivered, but if the “average” delivery time is tomorrow, then that was likely the default.

If we engineered this application specifically for ordering groceries, keeping in mind the declarative nature of AI and the information it “knows”, then we could make thoughtful design choices that improve functionality. We would need to prompt the user to specify when and where they want groceries up front (non-public information). With that information, we could find an appropriate grocery store near them. We would need access to that grocery store’s inventory (more non-public information). If we have access to the user’s previous orders, we could also pre-populate a cart with items typical to their order. If not, we may add a few suggested items and guide them to add more. By limiting the use case, we only have to deal with two sources of non-public information. This is a more tractable problem than Operator’s “agent that does it all” approach.

Newton also mentions that this process took eight minutes to complete, and “complete” means that Operator did everything up to placing the order. This is a long time with very little human-in-the-loop iteration. Like we said before, an iteration loop is very important for human-AI interaction. A better-designed application would generate smaller steps along the way and provide more frequent interaction. We could prompt the user to describe what to add to their shopping list. The user might say, “Add barbeque sauce to the list,” and see the list update. If they see a vinegar-based barbecue sauce, they can refine that by saying, “Replace that with a barbeque sauce that goes well with chicken,” and might be happier when it’s replaced by a honey barbecue sauce. These frequent iterations make the LLM a creative tool rather than a does-it-all agent. The does-it-all agent looks automagical in marketing, but a more guided approach provides more utility with a less frustrating and more delightful experience.

Elsewhere in the article, Newton gives an example of a prompt that Operator performed well: “Put together a lesson plan on the Great Gatsby for high school students, breaking it into readable chunks and then creating assignments and connections tied to the Common Core learning standard.” This prompt describes an output using much more specificity. It also solely relies on general information—the Great Gatsby, the Common Core standard, and a general sense of what assignments are. The general-information use case lends itself better to AI generation, and the prompt is explicit and detailed in its request. In this case, very little guidance was given to create the prompt, so it worked better. (In fact, this prompt comes from Ethan Mollick who has used it to evaluate AI chatbots.)

This is the risk of general-purpose AI applications like Operator. The quality of the result relies heavily on the use case and specificity provided by the user. An application with a more specific use case allows for more design guidance and can produce better output more reliably.

The limitations of deep research

Newton also reviewed deep research, which, according to OpenAI’s website, is an “agent that uses reasoning to synthesize large amounts of online information and complete multi-step research tasks for you.”

Deep research came out after Newton’s review of Operator. Newton chose an intentionally tricky prompt that prods at some of the tool’s limitations regarding fresh information and non-general information: “I wanted to see how OpenAI’s agent would perform given that it was researching a story that was less than a day old, and for which much of the coverage was behind paywalls that the agent would not be able to access. And indeed, the bot struggled more than I expected.”

Near the end of the article, Newton elaborates on some of the shortcomings he noticed with deep research.

OpenAI’s deep research suffers from the same design problem that almost all AI products have: its superpowers are completely invisible and must be harnessed through a frustrating process of trial and error.

Generally speaking, the more you already know about something, the more useful I think deep research is. This may be somewhat counterintuitive; perhaps you expected that an AI agent would be well suited to getting you up to speed on an important topic that just landed on your lap at work, for example. 

In my early tests, the reverse felt true. Deep research excels for drilling deep into subjects you already have some expertise in, letting you probe for specific pieces of information, types of analysis, or ideas that are new to you.

The “frustrating trial and error” shows a mismatch between Newton’s expectations and a necessary aspect of many generative AI applications. A good response requires more information than the user will probably give in the first attempt. The challenge is to design the application and set the user’s expectations so that this interaction is not frustrating but exciting.

Newton’s more poignant criticism is that the application requires already knowing something about the topic for it to work well. From the perspective of our framework, this makes sense. The more you know about a topic, the more detail you can provide. And as you iterate, having knowledge about a topic helps you observe and evaluate the output. Without the ability to describe it well or evaluate the results, the user is less likely to use the tool to generate good output.

A version of deep research designed for lawyers to perform legal research could be powerful. Lawyers have an extensive and common vocabulary for describing legal matters, and they’re more likely to see a result and know if it makes sense. Generative AI tools are fallible, though. So, the tool should focus on a generation-evaluation loop rather than writing a final draft of a legal document.

The article also highlights many improvements compared to Operator. Most notably, the bot asked clarifying questions. This is the most impressive aspect of the tool. Undoubtedly, it helps that deep search has a focused use-case of retrieving and summarizing general information instead of a does-it-all approach. Having a focused use case narrows the set of likely interactions, letting you design better guidance into the prompt flow.

Good application design with generative AI

Designing effective generative AI applications requires thoughtful consideration of how users interact with the technology, the types of information they need, and the limitations of the underlying models. Here are some key principles to guide the design of generative AI tools:

1. Constrain the input and focus on providing details

Applications are inputs and outputs. We want the outputs to be useful and pleasant. By giving a user a conversational chatbot interface, we allow for a vast surface area of potential inputs, making it a challenge to guarantee useful outputs. One strategy is to limit or guide the input to a more manageable subset.

For example, FigJam, a collaborative whiteboarding tool, uses pre-set template prompts for timelines, Gantt charts, and other common whiteboard artifacts. This provides some structure and predictability to the inputs. Users still have the freedom to describe further details like color or the content for each timeline event. This approach ensures that the AI has enough specificity to generate meaningful outputs while giving users creative control.

2. Design frequent iteration and evaluation into the tool

Iterating in a tight generation-evaluation loop is essential for refining outputs and ensuring they meet user expectations. OpenAI’s Dall-E is great at this. Users quickly iterate on image prompts and refine their descriptions to add additional detail. If you type “a picture of a cheeseburger on a plate”, you may then add more detail by specifying “with pepperjack cheese”.

AI code generating tools work well because users can run a generated code snippet immediately to see if it works, enabling rapid iteration and validation. This quick evaluation loop produces better results and a better coder experience. 

Designers of generative AI applications should pull the user in the loop early, often, in a way that is engaging rather than frustrating. Designers should also consider the user’s knowledge level. Users with domain expertise can iterate more effectively.

Referring back to the FigJam example, the prompts and icons in the app quickly communicate “this is what we call a mind map” or “this is what we call a gantt chart” for users who want to generate these artifacts but don’t know the terms for them. Giving the user some basic vocabulary can help them better generate desired results quickly with less frustration.

3. Be mindful of the types of information needed

LLMs excel at tasks involving general knowledge already in the base training set. For example, writing class assignments involves absorbing general information, synthesizing it, and producing a written output, so LLMs are very well-suited for that task.

Use cases that require non-general information are more complex. Some questions the designer and engineer should ask include:

  • Does this application require fresh information? Maybe this is knowledge of current events or a user’s current bank account balance. If so, that information needs to be retrieved and incorporated into the model.
  • How much non-general information does the LLM need to know? If it’s a lot of information—like a corpus of company documentation and communication—then the model may need to be fine tuned in batch ahead of time. If the information is relatively small, a retrieval augmented generation (RAG) approach at query time may suffice. 
  • How many sources of non-general information—small and finite or potentially infinite? General purpose agents like Operator face the challenge of potentially infinite non-general information sources. Depending on what the user requires, it could need to access their contacts, restaurant reservation lists, financial data, or even other people’s calendars. A single-purpose restaurant reservation chatbot may only need access to Yelp, OpenTable, and the user’s calendar. It’s much easier to reconcile access and authentication for a handful of known data sources.
  • Is there context-specific information that can only come from the user? Consider our restaurant reservation chatbot. Is the user making reservations for just themselves? Probably not. “How many people and who” is a detail that only the user can provide, an example of non-public information that only the user knows. We shouldn’t expect the user to provide this information upfront and unguided. Instead, we can use prompt suggestions so they include the information. We may even be able to design the LLM to ask these questions when the detail is not provided.

4. Focus on specific use cases

Broad, all-purpose chatbots often struggle to deliver consistent results due to the complexity and variability of user needs. Instead, focus on specific use cases where the AI’s shortcomings can be mitigated through thoughtful design.

Narrowing the scope helps us address many of the issues above.

  • We can identify common requests for the use case and incorporate those into prompt suggestions.
  • We can design an iteration loop that works well with the type of thing we’re generating.
  • We can identify sources of non-general information and devise solutions to incorporate it into the model or prompt.

5. Translation or summary tasks work well

A common task for ChatGPT is to rewrite something in a different style, explain what some computer code is doing, or summarize a long document. These tasks involve converting a set of information from one form to another.

We have the same concerns about non-general information and context. For instance, a Chatbot asked to explain a code script doesn’t know the system that script is part of unless that information is provided.

But in general, the task of transforming or summarizing information is less prone to missing details. By definition, you have provided the details it needs. The result should have the same information in a different or more condensed form.

The exception to the rules

There is a case when it doesn’t matter if you break any or all of these rules—when you’re just having fun. LLMs are creative tools by nature. They can be an easel to paint on, a sandbox to build in, a blank sheet to scribe. Iteration is still important; the user wants to see the thing they’re creating as they create it. But unexpected results due to lack of information or omitted details may add to the experience. If you ask for a cheeseburger recipe, you might get some funny or interesting ingredients. If the stakes are low and the process is its own reward, don’t worry about the rules.

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Nvidia launches research center to accelerate quantum computing breakthrough

The new research center aims to tackle quantum computing’s most significant challenges, including qubit noise reduction and the transformation of experimental quantum processors into practical devices. “By combining quantum processing units (QPUs) with state-of-the-art GPU technology, Nvidia hopes to accelerate the timeline to practical quantum computing applications,” the statement added.

Read More »

Keysight network packet brokers gain AI-powered features

The technology has matured considerably since then. Over the last five years, Singh said that most of Keysight’s NPB customers are global Fortune 500 organizations that have large network visibility practices. Meaning they deploy a lot of packet brokers with capabilities ranging anywhere from one gigabit networking at the edge,

Read More »

Adding, managing and deleting groups on Linux

$ sudo groupadd -g 1111 techs In this case, a specific group ID (1111) is being assigned. Omit the -g option to use the next available group ID (e.g., sudo groupadd techs). Once a group is added, you will find it in the /etc/group file. $ grep techs /etc/grouptechs:x:1111: Adding

Read More »

Global Oil Demand Makes Strong Start to 2025

Global oil demand has made a strong start to 2025, analysts at Standard Chartered Bank, including Commodities Research Head Paul Horsnell, said in a report sent to Rigzone by Horsnell on Thursday. “Based on a variety of national sources and the 19 March Joint Organizations Data Initiative (JODI) release, we estimate that demand averaged 102.77 million barrels per day in January, a year on year increase of 2.19 million barrels per day,” the Standard Chartered Bank analysts said in the report. “This is in line with the U.S. Energy Information Administration (EIA) estimate for January that put demand at 102.74 million barrels per day and growth at 1.85 million barrels per day,” they added. In the report, the Standard Chartered Bank analysts noted that January is usually the seasonal low point for global demand and said they expect demand “to move above 105.0 million barrels per day for the first time in June before reaching a 2025 high of 105.6 million barrels per day in August”. “Our forecast for 2025 demand growth stands at 1.41 million barrels per day. After weakening in H2-2024, our forecast is now back where it stood at its initiation in January 2024,” they added. “While the main downside risk to demand comes from U.S. tariff policy and the economic uncertainty it creates, for now demand-side fundamentals appear robust despite negative sentiment,” they continued. The Standard Chartered Bank analysts stated in the report that they expect global demand to exceed supply by 0.9 million barrels per day in the second quarter and by 0.5 million barrels per day in the third quarter. Rigzone has contacted the Trump transition team and the White House for comment on Standard Chartered Bank’s report. At the time of writing, neither have responded to Rigzone. In a research note sent to

Read More »

Voyager Midstream Acquires Phillips 66 Stake in Panola NGL Pipeline

Voyager Midstream Holdings LLC has completed the purchase of a non-operating stake in a Texas natural gas liquid (NGL) pipeline from Phillips 66. The 254-mile Panola Pipeline, operated by Enterprise Products Partners LP, transports Y-Grade NGLs from Panola County to fractionation facilities in Mont Belvieu city. “Voyager’s interest in the Panola Pipeline is a strategic fit with the company’s existing footprint in East Texas and North Louisiana”, Houston, Texas-based Voyager said in an online statement, referring to assets also acquired from Phillips 66. Voyager chief executive Will Harvey commented, “Panola Pipeline is a critical NGL pipeline connecting the major East Texas gas processing complexes and Gulf Coast demand markets”. “We are excited to work alongside our partners in Panola Pipeline to safely transport liquids to satisfy growing demand for NGLs along the Gulf Coast”, Harvey added. Voyager did not disclose the financial details of the transaction. It said that in conjunction with the acquisition, it has entered into a credit facility with the Bank of Oklahoma. “This credit facility, along with existing equity commitments from Pearl, provides Voyager with substantial flexibility and capital to continue growing its business in support of its customers”, it said. Pearl Energy Investments launched Voyager in 2023 as a platform for the acquisition and development of crude oil, natural gas and produced water infrastructure across key basins in North America. Voyager operates about 550 miles of natural gas pipelines and associated compression. It also has 400 million cubic feet a day of cryogenic gas processing capacity and 12,000 barrels per day of liquid fractionation capacity. It also operates Carthage Hub, a gas trading and delivery hub capable of handling over 1 billion cubic feet per day. Carthage Hub interconnects multiple markets across the United States including LNG markets in Texas and Louisiana. All of these

Read More »

BP makes first divestment of target $20bn asset sale

Energy firm BP has sold a stake worth $1 billion (£733m) in the Trans-Anatolian Natural Gas Pipeline (TANAP) to Apollo as part of the first tranche of a $20bn asset sale target. The US asset manager will take a 25% non-operated stake in BP Pipelines (TANAP) (BP TANAP) which itself holds BP’s 12% interest in TANAP, owner and operator of the pipeline that carries natural gas from Azerbaijan across Turkey. The sale comes after BP chief executive Murray Auchincloss unveiled plans to review assets for a potential sale including its core lubricants business, Castrol and its solar business, BP Lightsource. The $20bn target was announced alongside a “fundamental reset” for the firm as it turns focus to its traditional oil and gas production business. The deal marks the second such sale agreed with US fund manager, Apollo. Last year Apollo snapped up another $1bn BP-owned stake in Trans Adriatic Pipeline (TAP). TANAP, running for approximately 1,120 miles (1,800km) across Turkey, is the central section of the Southern Gas Corridor project (SGC) pipeline system. The SGC transports gas from the BP-operated Shah Deniz gas field in the Azerbaijan sector of the Caspian Sea to markets in Europe, including Italy and Greece. It connects to TAP at the Greek-Turkish border, which crosses Northern Greece, Albania and the Adriatic Sea before coming ashore in Southern Italy to connect to the Italian natural gas network. BP said the deal allows it to “monetise” its interest in TANAP while retaining control of the asset. BP executive vice president for gas and low carbon energy William Lin said: “This unlocks capital from our global portfolio while retaining our role in this strategic asset for bringing Azerbaijan gas to Europe. BP and Apollo will continue to explore further strategic cooperation and mutually beneficial opportunities.” Apollo partner Skardon Baker

Read More »

‘First Major Project’ for GB Energy Announced

A release posted on the UK government website on Friday announced that the “first major project” for Great British Energy (GB Energy) “is to put rooftop solar panels on around 200 schools and 200 NHS sites, saving hundreds of millions on their energy bills”. Hundreds of schools, NHS trusts and communities across the UK will benefit from new rooftop solar power and renewable schemes to save money on their energy bills, thanks to a total GBP 200 million ($258.6 million) investment from the UK government and Great British Energy, the release stated. “In England around GBP 80 million ($103.4 million) in funding will support around 200 schools, alongside GBP 100 million ($129.3 million) for nearly 200 NHS sites, covering a third of NHS trusts, to install rooftop solar panels that could power classrooms and operations, with potential to sell leftover energy back to the grid,” the release noted. The first panels are expected to be in schools and hospitals by the end of summer 2025, according to the release. The release stated that local authorities and community energy groups will also be supported by nearly GBP 12 million ($15.5 million) to help build local clean energy projects. A further GBP 9.3 million ($12.0 million) will power schemes in Scotland, Wales, and Northern Ireland including community energy or rooftop solar for public buildings, the release added. “Great British Energy’s first major project will be to help our vital public institutions save hundreds of millions on bills to reinvest on the frontline,” Energy Secretary Ed Miliband said in the release. “Great British Energy will provide power for pupils and patients,” he added. “Parents at the school gate and patients in hospitals will experience the difference Great British Energy can make. This is our clean energy superpower mission in action, with lower bills

Read More »

Raymond James Sees Biggest Crop of New Oil Projects in a Decade

A flurry of oil projects from Brazil to Saudi Arabia are set to come online this year, providing the biggest infusion of new crude production in more than a decade.  Fresh oil field output is expected to total about 2.9 million barrels a day in 2025, up from about 800,000 barrels last year, according to data from Raymond James. That’s the most in data stretching back to 2015. Among the largest projects are the Tengiz field in Kazakhstan and Bacalhau in Brazil, as well as the Berri and Marjan expansions in Saudi Arabia. The projections for this year and next are subject to delays, and could change. Global oil forecasters have been projecting a supply overhang for 2025 as countries including Guyana and Brazil bring on new output and OPEC+ plans to start reviving idled output in April. Meanwhile, US President Donald Trump’s trade policies have fanned concerns about reduced global energy demand. The US Energy Information Administration projects supply will exceed demand by 100,000 barrels a day this year, and the International Energy Agency sees a surplus of 600,000 barrels a day. While Raymond James didn’t provide full forecasts for production and consumption, the firm projects that supply will outstrip demand by 280,000 barrels a day toward the end of 2025.  “Investors have not fully grasped just how much new supply from projects is on deck in 2025,” said Pavel Molchanov, an analyst at Raymond James. What do you think? We’d love to hear from you, join the conversation on the Rigzone Energy Network. The Rigzone Energy Network is a new social experience created for you and all energy professionals to Speak Up about our industry, share knowledge, connect with peers and industry insiders and engage in a professional community that will empower your career in energy. MORE FROM THIS

Read More »

Net zero cannot be ‘in isolation’ from fossil fuels, industry chief to say

The drive for net zero cannot be “in isolation from the hydrocarbon sector”, the head of the CBI (Confederation of British Industry) will say today. Rain Newton-Smith is due to speak at the group’s annual lunch in Edinburgh, alongside First Minister John Swinney, where she will laud the oil and gas industry as “the bridge” to net zero. But the business boss will also lament government action which has hit the fossil fuels industry. The speech comes at a time when the idea of net zero is becoming unpopular with the Conservatives and a surging Reform UK – which has made opposing them a key plank of their offering to the public. “Despite the voices being raised against net zero, the fact is Scotland is sitting on a goldmine of green energy,” Newton-Smith is expected to say. “The numbers don’t lie. The opportunities are there. “Since 2022, Scotland’s net-zero sector has grown 20% and created 16,000 more jobs while average UK growth has near-flatlined. “So, let me be crystal clear. Business is behind net zero. Business is invested in our energy transition. And we’re behind the plans to go further. “But we can’t see net zero in isolation from the hydrocarbon sector. “Especially in Scotland. Oil and gas are still tens of thousands of jobs here. From the latest data it still makes up over 10% of Scotland’s GDP. “It will still be a part of the energy mix and the bridge to net zero, for some time yet. The infrastructure, the investment, the skills and knowledge of these industries will be mission critical for the transition. “But too often, they have been left out of the picture, hit by repeated tax changes and uncertainty. “On one hand, we need clear timelines and funding for net-zero commitments government has already

Read More »

PEAK:AIO adds power, density to AI storage server

There is also the fact that many people working with AI are not IT professionals, such as professors, biochemists, scientists, doctors, clinicians, and they don’t have a traditional enterprise department or a data center. “It’s run by people that wouldn’t really know, nor want to know, what storage is,” he said. While the new AI Data Server is a Dell design, PEAK:AIO has worked with Lenovo, Supermicro, and HPE as well as Dell over the past four years, offering to convert their off the shelf storage servers into hyper fast, very AI-specific, cheap, specific storage servers that work with all the protocols at Nvidia, like NVLink, along with NFS and NVMe over Fabric. It also greatly increased storage capacity by going with 61TB drives from Solidigm. SSDs from the major server vendors typically maxed out at 15TB, according to the vendor. PEAK:AIO competes with VAST, WekaIO, NetApp, Pure Storage and many others in the growing AI workload storage arena. PEAK:AIO’s AI Data Server is available now.

Read More »

SoftBank to buy Ampere for $6.5B, fueling Arm-based server market competition

SoftBank’s announcement suggests Ampere will collaborate with other SBG companies, potentially creating a powerful ecosystem of Arm-based computing solutions. This collaboration could extend to SoftBank’s numerous portfolio companies, including Korean/Japanese web giant LY Corp, ByteDance (TikTok’s parent company), and various AI startups. If SoftBank successfully steers its portfolio companies toward Ampere processors, it could accelerate the shift away from x86 architecture in data centers worldwide. Questions remain about Arm’s server strategy The acquisition, however, raises questions about how SoftBank will balance its investments in both Arm and Ampere, given their potentially competing server CPU strategies. Arm’s recent move to design and sell its own server processors to Meta signaled a major strategic shift that already put it in direct competition with its own customers, including Qualcomm and Nvidia. “In technology licensing where an entity is both provider and competitor, boundaries are typically well-defined without special preferences beyond potential first-mover advantages,” Kawoosa explained. “Arm will likely continue making independent licensing decisions that serve its broader interests rather than favoring Ampere, as the company can’t risk alienating its established high-volume customers.” Industry analysts speculate that SoftBank might position Arm to focus on custom designs for hyperscale customers while allowing Ampere to dominate the market for more standardized server processors. Alternatively, the two companies could be merged or realigned to present a unified strategy against incumbents Intel and AMD. “While Arm currently dominates processor architecture, particularly for energy-efficient designs, the landscape isn’t static,” Kawoosa added. “The semiconductor industry is approaching a potential inflection point, and we may witness fundamental disruptions in the next 3-5 years — similar to how OpenAI transformed the AI landscape. SoftBank appears to be maximizing its Arm investments while preparing for this coming paradigm shift in processor architecture.”

Read More »

Nvidia, xAI and two energy giants join genAI infrastructure initiative

The new AIP members will “further strengthen the partnership’s technology leadership as the platform seeks to invest in new and expanded AI infrastructure. Nvidia will also continue in its role as a technical advisor to AIP, leveraging its expertise in accelerated computing and AI factories to inform the deployment of next-generation AI data center infrastructure,” the group’s statement said. “Additionally, GE Vernova and NextEra Energy have agreed to collaborate with AIP to accelerate the scaling of critical and diverse energy solutions for AI data centers. GE Vernova will also work with AIP and its partners on supply chain planning and in delivering innovative and high efficiency energy solutions.” The group claimed, without offering any specifics, that it “has attracted significant capital and partner interest since its inception in September 2024, highlighting the growing demand for AI-ready data centers and power solutions.” The statement said the group will try to raise “$30 billion in capital from investors, asset owners, and corporations, which in turn will mobilize up to $100 billion in total investment potential when including debt financing.” Forrester’s Nguyen also noted that the influence of two of the new members — xAI, owned by Elon Musk, along with Nvidia — could easily help with fundraising. Musk “with his connections, he does not make small quiet moves,” Nguyen said. “As for Nvidia, they are the face of AI. Everything they do attracts attention.” Info-Tech’s Bickley said that the astronomical dollars involved in genAI investments is mind-boggling. And yet even more investment is needed — a lot more.

Read More »

IBM broadens access to Nvidia technology for enterprise AI

The IBM Storage Scale platform will support CAS and now will respond to queries using the extracted and augmented data, speeding up the communications between GPUs and storage using Nvidia BlueField-3 DPUs and Spectrum-X networking, IBM stated. The multimodal document data extraction workflow will also support Nvidia NeMo Retriever microservices. CAS will be embedded in the next update of IBM Fusion, which is planned for the second quarter of this year. Fusion simplifies the deployment and management of AI applications and works with Storage Scale, which will handle high-performance storage support for AI workloads, according to IBM. IBM Cloud instances with Nvidia GPUs In addition to the software news, IBM said its cloud customers can now use Nvidia H200 instances in the IBM Cloud environment. With increased memory bandwidth (1.4x higher than its predecessor) and capacity, the H200 Tensor Core can handle larger datasets, accelerating the training of large AI models and executing complex simulations, with high energy efficiency and low total cost of ownership, according to IBM. In addition, customers can use the power of the H200 to process large volumes of data in real time, enabling more accurate predictive analytics and data-driven decision-making, IBM stated. IBM Consulting capabilities with Nvidia Lastly, IBM Consulting is adding Nvidia Blueprint to its recently introduced AI Integration Service, which offers customers support for developing, building and running AI environments. Nvidia Blueprints offer a suite pre-validated, optimized, and documented reference architectures designed to simplify and accelerate the deployment of complex AI and data center infrastructure, according to Nvidia.  The IBM AI Integration service already supports a number of third-party systems, including Oracle, Salesforce, SAP and ServiceNow environments.

Read More »

Nvidia’s silicon photonics switches bring better power efficiency to AI data centers

Nvidia typically uses partnerships where appropriate, and the new switch design was done in collaboration with multiple vendors across different aspects, including creating the lasers, packaging, and other elements as part of the silicon photonics. Hundreds of patents were also included. Nvidia will licensing the innovations created to its partners and customers with the goal of scaling this model. Nvidia’s partner ecosystem includes TSMC, which provides advanced chip fabrication and 3D chip stacking to integrate silicon photonics into Nvidia’s hardware. Coherent, Eoptolink, Fabrinet, and Innolight are involved in the development, manufacturing, and supply of the transceivers. Additional partners include Browave, Coherent, Corning Incorporated, Fabrinet, Foxconn, Lumentum, SENKO, SPIL, Sumitomo Electric Industries, and TFC Communication. AI has transformed the way data centers are being designed. During his keynote at GTC, CEO Jensen Huang talked about the data center being the “new unit of compute,” which refers to the entire data center having to act like one massive server. That has driven compute to be primarily CPU based to being GPU centric. Now the network needs to evolve to ensure data is being fed to the GPUs at a speed they can process the data. The new co-packaged switches remove external parts, which have historically added a small amount of overhead to networking. Pre-AI this was negligible, but with AI, any slowness in the network leads to dollars being wasted.

Read More »

Critical vulnerability in AMI MegaRAC BMC allows server takeover

“In disruptive or destructive attacks, attackers can leverage the often heterogeneous environments in data centers to potentially send malicious commands to every other BMC on the same management segment, forcing all devices to continually reboot in a way that victim operators cannot stop,” the Eclypsium researchers said. “In extreme scenarios, the net impact could be indefinite, unrecoverable downtime until and unless devices are re-provisioned.” BMC vulnerabilities and misconfigurations, including hardcoded credentials, have been of interest for attackers for over a decade. In 2022, security researchers found a malicious implant dubbed iLOBleed that was likely developed by an APT group and was being deployed through vulnerabilities in HPE iLO (HPE’s Integrated Lights-Out) BMC. In 2018, a ransomware group called JungleSec used default credentials for IPMI interfaces to compromise Linux servers. And back in 2016, Intel’s Active Management Technology (AMT) Serial-over-LAN (SOL) feature which is part of Intel’s Management Engine (Intel ME), was exploited by an APT group as a covert communication channel to transfer files. OEM, server manufacturers in control of patching AMI released an advisory and patches to its OEM partners, but affected users must wait for their server manufacturers to integrate them and release firmware updates. In addition to this vulnerability, AMI also patched a flaw tracked as CVE-2024-54084 that may lead to arbitrary code execution in its AptioV UEFI implementation. HPE and Lenovo have already released updates for their products that integrate AMI’s patch for CVE-2024-54085.

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »