Stay Ahead, Stay ONMINE

Synthetic Data Generation with LLMs

Popularity of RAG Over the past two years while working with financial firms, I’ve observed firsthand how they identify and prioritize Generative AI use cases, balancing complexity with potential value. Retrieval-Augmented Generation (RAG) often stands out as a foundational capability across many LLM-driven solutions, striking a balance between ease of implementation and real-world impact. By combining […]

Popularity of RAG

Over the past two years while working with financial firms, I’ve observed firsthand how they identify and prioritize Generative AI use cases, balancing complexity with potential value.

Retrieval-Augmented Generation (RAG) often stands out as a foundational capability across many LLM-driven solutions, striking a balance between ease of implementation and real-world impact. By combining a retriever that surfaces relevant documents with an LLM that synthesizes responses, RAG streamlines knowledge access, making it invaluable for applications like customer support, research, and internal knowledge management.

Defining clear evaluation criteria is key to ensuring LLM solutions meet performance standards, just as Test-Driven Development (TDD) ensures reliability in traditional software. Drawing from TDD principles, an evaluation-driven approach sets measurable benchmarks to validate and improve AI workflows. This becomes especially important for LLMs, where the complexity of open-ended responses demands consistent and thoughtful evaluation to deliver reliable results.

For RAG applications, a typical evaluation set includes representative input-output pairs that align with the intended use case. For example, in chatbot applications, this might involve Q&A pairs reflecting user inquiries. In other contexts, such as retrieving and summarizing relevant text, the evaluation set could include source documents alongside expected summaries or extracted key points. These pairs are often generated from a subset of documents, such as those that are most viewed or frequently accessed, ensuring the evaluation focuses on the most relevant content.

Key Challenges

Creating evaluation datasets for RAG systems has traditionally faced two major challenges.

  1. The process often relied on subject matter experts (SMEs) to manually review documents and generate Q&A pairs, making it time-intensive, inconsistent, and costly.
  2. Limitations preventing LLMs from processing visual elements within documents, such as tables or diagrams, as they are restricted to handling text. Standard OCR tools struggle to bridge this gap, often failing to extract meaningful information from non-textual content.

Multi-Modal Capabilities

The challenges of handling complex documents have evolved with the introduction of multimodal capabilities in foundation models. Commercial and open-source models can now process both text and visual content. This vision capability eliminates the need for separate text-extraction workflows, offering an integrated approach for handling mixed-media PDFs.

By leveraging these vision features, models can ingest entire pages at once, recognizing layout structures, chart labels, and table content. This not only reduces manual effort but also improves scalability and data quality, making it a powerful enabler for RAG workflows that rely on accurate information from a variety of sources.


Dataset Curation for Wealth Management Research Report

To demonstrate a solution to the problem of manual evaluation set generation, I tested my approach using a sample document — the 2023 Cerulli report. This type of document is typical in wealth management, where analyst-style reports often combine text with complex visuals. For a RAG-powered search assistant, a knowledge corpus like this would likely contain many such documents.

My goal was to demonstrate how a single document could be leveraged to generate Q&A pairs, incorporating both text and visual elements. While I didn’t define specific dimensions for the Q&A pairs in this test, a real-world implementation would involve providing details on types of questions (comparative, analysis, multiple choice), topics (investment strategies, account types), and many other aspects. The primary focus of this experiment was to ensure the LLM generated questions that incorporated visual elements and produced reliable answers.

POC Workflow

My workflow, illustrated in the diagram, leverages Anthropic’s Claude Sonnet 3.5 model, which simplifies the process of working with PDFs by handling the conversion of documents into images before passing them to the model. This built-in functionality eliminates the need for additional third-party dependencies, streamlining the workflow and reducing code complexity.

I excluded preliminary pages of the report like the table of contents and glossary, focusing on pages with relevant content and charts for generating Q&A pairs. Below is the prompt I used to generate the initial question-answer sets.

You are an expert at analyzing financial reports and generating question-answer pairs. For the provided PDF, the 2023 Cerulli report:

1. Analyze pages {start_idx} to {end_idx} and for **each** of those 10 pages:
   - Identify the **exact page title** as it appears on that page (e.g., "Exhibit 4.03 Core Market Databank, 2023").
   - If the page includes a chart, graph, or diagram, create a question that references that visual element. Otherwise, create a question about the textual content.
   - Generate two distinct answers to that question ("answer_1" and "answer_2"), both supported by the page’s content.
   - Identify the correct page number as indicated in the bottom left corner of the page.
2. Return exactly 10 results as a valid JSON array (a list of dictionaries). Each dictionary should have the keys: “page” (int), “page_title” (str), “question” (str), “answer_1” (str), and “answer_2” (str). The page title typically includes the word "Exhibit" followed by a number.

Q&A Pair Generation

To refine the Q&A generation process, I implemented a comparative learning approach that generates two distinct answers for each question. During the evaluation phase, these answers are assessed across key dimensions such as accuracy and clarity, with the stronger response selected as the final answer.

This approach mirrors how humans often find it easier to make decisions when comparing alternatives rather than evaluating something in isolation. It’s like an eye examination: the optometrist doesn’t ask if your vision has improved or declined but instead, presents two lenses and asks, Which is clearer, option 1 or option 2? This comparative process eliminates the ambiguity of assessing absolute improvement and focuses on relative differences, making the choice simpler and more actionable. Similarly, by presenting two concrete answer options, the system can more effectively evaluate which response is stronger.

This methodology is also cited as a best practice in the article “What We Learned from a Year of Building with LLMs” by leaders in the AI space. They highlight the value of pairwise comparisons, stating: Instead of asking the LLM to score a single output on a Likert scale, present it with two options and ask it to select the better one. This tends to lead to more stable results.” I highly recommend reading their three-part series, as it provides invaluable insights into building effective systems with LLMs!

LLM Evaluation

For evaluating the generated Q&A pairs, I used Claude Opus for its advanced reasoning capabilities. Acting as a “judge,” the LLM compared the two answers generated for each question and selected the better option based on criteria such as directness and clarity. This approach is supported by extensive research (Zheng et al., 2023) that showcases LLMs can perform evaluations on par with human reviewers.

This approach significantly reduces the amount of manual review required by SMEs, enabling a more scalable and efficient refinement process. While SMEs remain essential during the initial stages to spot-check questions and validate system outputs, this dependency diminishes over time. Once a sufficient level of confidence is established in the system’s performance, the need for frequent spot-checking is reduced, allowing SMEs to focus on higher-value tasks.

Lessons Learned

Claude’s PDF capability has a limit of 100 pages, so I broke the original document into four 50-page sections. When I tried processing each 50-page section in a single request — and explicitly instructed the model to generate one Q&A pair per page — it still missed some pages. The token limit wasn’t the real problem; the model tended to focus on whichever content it considered most relevant, leaving certain pages underrepresented.

To address this, I experimented with processing the document in smaller batches, testing 5, 10, and 20 pages at a time. Through these tests, I found that batches of 10 pages (e.g., pages 1–10, 11–20, etc.) provided the best balance between precision and efficiency. Processing 10 pages per batch ensured consistent results across all pages while optimizing performance.

Another challenge was linking Q&A pairs back to their source. Using tiny page numbers in a PDF’s footer alone didn’t consistently work. In contrast, page titles or clear headings at the top of each page served as reliable anchors. They were easier for the model to pick up and helped me accurately map each Q&A pair to the right section.

Example Output

Below is an example page from the report, featuring two tables with numerical data. The following question was generated for this page:
How has the distribution of AUM changed across different-sized Hybrid RIA firms?

Answer: Mid-sized firms ($25m to <$100m) experienced a decline in AUM share from 2.3% to 1.0%.

In the first table, the 2017 column shows a 2.3% share of AUM for mid-sized firms, which decreases to 1.0% in 2022, thereby showcasing the LLM’s ability to synthesize visual and tabular content accurately.

Benefits

Combining caching, batching and a refined Q&A workflow led to three key advantages:

Caching

  • In my experiment, processing a singular report without caching would have cost $9, but by leveraging caching, I reduced this cost to $3 — a 3x cost savings. Per Anthropic’s pricing model, creating a cache costs $3.75 / million tokens, however, reads from the cache are only $0.30 / million tokens. In contrast, input tokens cost $3 / million tokens when caching is not used.
  • In a real-world scenario with more than one document, the savings become even more significant. For example, processing 10,000 research reports of similar length without caching would cost $90,000 in input costs alone. With caching, this cost drops to $30,000, achieving the same precision and quality while saving $60,000.

Discounted Batch Processing

  • Using Anthropic’s Batches API cuts output costs in half, making it a much cheaper option for certain tasks. Once I had validated the prompts, I ran a single batch job to evaluate all the Q&A answer sets at once. This method proved far more cost-effective than processing each Q&A pair individually.
  • For example, Claude 3 Opus typically costs $15 per million output tokens. By using batching, this drops to $7.50 per million tokens — a 50% reduction. In my experiment, each Q&A pair generated an average of 100 tokens, resulting in approximately 20,000 output tokens for the document. At the standard rate, this would have cost $0.30. With batch processing, the cost was reduced to $0.15, highlighitng how this approach optimizes costs for non-sequential tasks like evaluation runs.

Time Saved for SMEs

  • With more accurate, context-rich Q&A pairs, Subject Matter Experts spent less time sifting through PDFs and clarifying details, and more time focusing on strategic insights. This approach also eliminates the need to hire additional staff or allocate internal resources for manually curating datasets, a process that can be time-consuming and expensive. By automating these tasks, companies save significantly on labor costs while streamlining SME workflows, making this a scalable and cost-effective solution.
Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

AI dominates Gartner’s top strategic technology trends for 2026

“AI supercomputing platforms integrate CPUs, GPUs, AI ASICs, neuromorphic and alternative computing paradigms, enabling organizations to orchestrate complex workloads while unlocking new levels of performance, efficiency and innovation. These systems combine powerful processors, massive memory, specialized hardware, and orchestration software to tackle data-intensive workloads in areas like machine learning, simulation,

Read More »

IBM signs up Groq for speedy AI inferencing option

The technology involved in the partnership will let customers use watsonx capabilities in a familiar way and allow them to use their preferred tools while accelerating inference with GroqCloud, IBM stated. “This integration will address key AI developer needs, including inference orchestration, load balancing, and hardware acceleration, ultimately streamlining the

Read More »

Wi-Fi 8 is coming and it’s going to make AI a lot faster

Traditional Wi-Fi optimizes for 90/10 download-to-upload ratios. AI applications push toward 50/50 symmetry. Voice assistants, edge AI processing and sensor data all require consistent uplink capacity. “AI traffic looks different,” Szymanski explained. “It’s increasingly symmetric, with heavy uplink demands from these edge devices. These devices are pushing all this data

Read More »

Oil Rises on U.S. Reserve Refill

Oil eked out a gain with Washington planning on buying 1 million barrels of crude for the national stockpile, but held near a five-month low on expectations of a looming global surplus. West Texas Intermediate traded in a more than $1 range before settling near $58 a barrel. Although the US plan to refill the Strategic Petroleum Reserve supported prices, it wasn’t enough to shift sentiment in a market that has declined by more than 10% since late September. WTI futures are on course for a third monthly loss. The amount of crude on tankers at sea has risen to a record high, signaling that a long-anticipated surplus may have started to materialize, while time spreads are starting to signal ample supply. The International Energy Agency expects world oil inventories to exceed demand by almost 4 million barrels a day next year as OPEC+ and some countries outside the alliance ramp up output, likely in a bid to recapture market share. “We’ve got supply growth running three times faster than demand growth,” Bob McNally, founder and president of Rapidan Energy Group, said in an interview on Bloomberg Television. “Near-term we have a glut.” Commodity trading advisers, meanwhile, could potentially reach a maximum-short position in the next few sessions, helping send prices lower, according to data from Bridgeton Research Group. The robot traders are currently 91% short in both Brent and WTI, and could accelerate if futures fall by roughly 1%, the firm added. Traders are also keeping an eye on relations between the US and China, the world’s top producer and consumer of oil. US President Donald Trump again signaled that an expected meeting with counterpart Xi Jinping in South Korea next week might not materialize. The US benchmark crude future’s November expiry on Tuesday also contributed to choppy trading.

Read More »

Energy Department Issues Solicitation to Purchase Crude Oil for the Strategic Petroleum Reserve

WASHINGTON—The U.S. Department of Energy (DOE) today announced a new solicitation to purchase one million barrels of crude oil for delivery to the Strategic Petroleum Reserve (SPR) at the Bryan Mound site. The solicitation is in accordance with the Working Families Tax Cut which President Trump signed into law earlier this year. The legislation appropriated $171 million to begin refilling the SPR. “After the previous administration recklessly drained the SPR for political purposes, President Trump promised to refill and manage this national security asset more responsibly,” said Secretary Wright. “Thanks to the President and Congress, we are able to begin the process of refilling the SPR. While this process won’t be complete overnight, these actions are an important step in strengthening our energy security and reversing the costly and irresponsible energy policies of the last administration.” This announcement delivers on President Trump’s promise to rebuild America’s strategic strength and restore the reserve to full operational capacity. Currently, the SPR holds just over 400 million barrels of its 700 million barrel capacity. The SPR was severely weakened by the previous administration’s reckless 180-million-barrel drawdown in 2022, which incurred nearly $280 million in costs, delayed critical infrastructure maintenance and put unprecedented wear and tear on storage and injection facilities. The solicitation invites bids for an initial purchase of one million barrels of oil through a spot-price-indexed contract, with deliveries scheduled for December 2025 and January 2026. All notices of acquisition limit purchases to U.S. companies or U.S. subsidiaries of international companies with crude oil sourced from domestic production. Bids for the solicitation are due no later than 11:00 A.M. CT on October 28, 2025. For more information on the SPR please visit Infographic: Strategic Petroleum Reserve and Fact Sheet: Strategic Petroleum Reserve. ###

Read More »

Var Energi Sees Oil Stabilizing Next Year

Var Energi ASA, Norway’s third-biggest oil and gas company, sees oil’s supply and demand outlook stabilizing next year, with prices not dropping significantly below $60 a barrel. “We may see a short period of oversupply, but I think when you look into next year, you see that supply-demand balance coming back into line,” Chief Executive Officer Nick Walker told reporters on Tuesday. “Oil is going to be required for a long time, and the industry has not been investing enough.” Industry watchers, including the Paris-based International Energy Agency, have been predicting a flood of supplies for more than a year. Additional barrels from the Organization of the Petroleum Exporting Countries and its allies, as well as nations outside the group, are seen overwhelming cooling demand growth. Futures are heading for a third monthly loss and top traders are braced for a further slide. Lower oil prices will reduce investments and eventually slow output, Walker said, adding that “there seems to be a floor of about $60, it doesn’t go below that regardless of the volumes coming in.” Var Energi has seen several fields come online this year, including Johan Castberg in the Barents Sea and the startup of Balder X in June. Output from both fields will contribute to production rising to about 430,000 barrels of oil equivalent a day in the fourth quarter. To maintain barrels through the end of the decade, the company will sanction a total of ten new projects by year’s end, Walker said, with four already approved at break-evens of below $35 a barrel. The oil and gas company’s earnings before interest and tax climbed to $1.07 billion in the third quarter, beating analyst estimates. Var Energi plans to pay out $1.2 billion in dividends this year and in 2026. WHAT DO YOU THINK? Generated by readers, the comments included

Read More »

Utility identity fraud is a bigger problem than people think

David Maimon is head of fraud insights at SentiLink. Utility companies have become prime targets for identity-related fraud, yet the true scope of the threat remains largely hidden. In 2024, the FTC’s Consumer Sentinel Network logged about 28,000 reports of stolen identities used to open new utility accounts and another 2,000 tied to existing ones. Those numbers exceed pre-COVID levels but fall short of a pandemic-era spike. At first glance, that looks like stabilization. It’s not. The reality is worse: Utilities are not legally required to report fraud incidents, leaving much of the damage invisible. Instead, companies push customers to file complaints — something many never do, whether because the utility absorbs the loss, the consumer avoids the hassle, or, in the case of synthetic identities, no real victim exists to file at all. My research shows that the use of both stolen and synthetic identities to defraud utilities is persistent — and growing. Below I explain why utilities are prime targets, the tactics fraudsters use and how these crimes ripple across the broader fraud ecosystem. Why utilities? Utility providers are attractive targets for a mix of technical, regulatory and structural reasons. Low-friction onboarding: Compared to banks, many utilities still use limited identity verification. In the absence of strong “know your customer” rules, criminals can easily open accounts using stolen or synthetic identities. Delayed detection: Billing cycles stretch 30 to 60 days, giving scammers weeks of service before red flags appear. Fraudsters often exploit this delay and abandon accounts before utilities catch on. Service protections: Shutoff moratoriums — implemented during extreme weather or for medical “critical care” designations — protect vulnerable populations, but also shield fraudsters from enforcement. Once flagged as critical care, accounts can be nearly impossible to shut off even when fraud is suspected. Low legal risk: Utilities are not

Read More »

Repsol to Link 805-MW Wind Farms to Zaragoza Combined Cycle Plant

Repsol SA said it will hybridize its 818-megawatt (MW) combined cycle power plant in Escatron, Zaragoza, through 15 wind projects with a combined capacity of 805 MW. Repsol acquired the wind farms from Forestalia, which remains their developer until commissioning, Repsol said in a press release. Repsol plans to connect the hybridized plant to a proposed data center owned by a third party. The collaboration represents “the largest energy hybridization project in Spain and one of the largest worldwide in the northern Spanish region of Aragon”, the Spanish integrated energy company said. “This hybridization project will add more than 1,600 MW from two different energy sources, but sharing the same connection point to the grid, which will permit the complementarity of both sources to be exploited, stabilizing supply, maximizing the use of the connection point to the grid and optimizing infrastructure and maintenance”, Repsol said. “The wind farms that form part of the project are in the processing and development phase and have received a favorable Environmental Impact Statement. They will share the evacuation infrastructure to the power station, which optimizes resources and streamlines operational integration. “This project has clear synergies with the development, by a third party, of a data center in the vicinity of the power plant. For this data center, Repsol has already secured grid connection approval from the Spanish system operator, Red Electrica de España, for the connection of 402 MW of renewable energy self-consumption. This will be complemented by more than 800 MW of support energy provided through hybridization, giving this future data center one of the highest power capacities in the country”. In a separate renewable energy agreement, Repsol and Norwegian Cruise Line Holdings Ltd (NCLH) said Monday Repsol will supply renewable fuels to NCLH vessels at the Port of Barcelona for eight years.

Read More »

Phillips 66 and Kinder Plan 1st Ever California-Bound Fuel Pipe

Phillips 66 and Kinder Morgan Inc. plan to build a new pipeline system and reverse the flow on some existing conduits to haul gasoline and other fuels to California, Arizona and Nevada. As California’s in-state refining capacity dwindles, the regional market is becoming increasingly reliant on imported fuels, especially gasoline. The pipeline project hatched by Phillips 66 and Kinder will carry fuels from as far away as the Midwest to augment supplies sent by refiners in Washington State and Asia.  The project, slated for completion around 2029, would be the first pipeline system to deliver motor fuels into California, a state long considered an island disconnected from the major refining hubs of the Gulf Coast and Midwest. The investment comes amid a wave of California refinery closures or conversions — including the imminent shutdown of Phillips 66’s Los Angeles-area plant — that threaten to squeeze motorists with shrinking fuel supplies and higher pump prices. Consumers in some neighboring states also are expected to feel the impacts.  Phillips 66 and Kinder announced Monday what is known as an open season on their proposed Western Gateway Pipeline, during which bids are solicited from shippers that would reserve space on the conduit for moving fuels. The bidding window will close on Dec. 19, Phillips 66 wrote in the statement. The Western Gateway project would involve building a new line between Borger, Texas, where Phillips 66 operates a refinery, and Phoenix. Meanwhile, the flow on Kinder’s existing California-to-Arizona pipe known as SFPP will be reversed so more fuel can flow to the Golden State.  Phillips 66 also plans to reverse the flow on the Gold Pipeline that carries supplies from Texas to the St. Louis region so Midwest fuel can move west. The refiner operates fuel-making plants along the Gold Pipeline route including in

Read More »

Why cloud and AI projects take longer and how to fix the holdups

No. 2 problem: Unrealistic expectations lead to problematic requirements Early planning and business case validation show that the requirements set for the project can’t be met, which then requires a period of redefinition before real work can start. This situation – reported by 69% of enterprises – leads to an obvious question: Is it the requirements or the project that’s the problem? Enterprises who cite this issue say it’s the former, and that it’s how the requirements are set that’s usually the cause. In the case of the cloud, the problem is that senior management thinks that the cloud is always cheaper, that you can always cut costs by moving to the cloud. This is despite the recent stories on “repatriation,” or moving cloud applications back into the data center. In the case of cloud projects, most enterprise IT organizations now understand how to assess a cloud project for cost/benefit, so most of the cases where impossible cost savings are promised are caught in the planning phase. For AI, both senior management and line department management have high expectations with respect to the technology, and in the latter case may also have some experience with AI in the form of as-a-service generative AI models available online. About a quarter of these proposals quickly run afoul of governance policies because of problems with data security, and half of this group dies at this point. For the remaining proposals, there is a whole set of problems that emerge. Most enterprises admit that they really don’t understand what AI can do, which obviously makes it hard to frame a realistic AI project. The biggest gap identified is between an AI business goal and a specific path leading to it. One CIO calls the projects offered by user organizations as “invitations to AI fishing

Read More »

Riverbed tackles AI data bottleneck with new Oracle-based service

“Customers are looking for faster, more secure ways to move massive datasets so they can bring AI initiatives to life,” said Sachin Menon, Oracle’s vice president of cloud engineering, in a statement. “With Riverbed Data Express Service deployed on OCI, organizations will be able to accelerate time to value, reduce costs, and help ensure that their data remains protected.” Riverbed’s Aras explains that its Data Express Service uses post-quantum cryptography (PQC) to move petabyte-scale datasets through secure VPN tunnels to ensure that customer data remains protected during the transfer process. The technology is based on Riverbed’s SteelHead acceleration platform running RiOS 10 software. “Our cloud-optimized technology design delivers much higher data retrieval, data movement across the network, and data write rates, through highly performant data mover instances, instance parallelization and matched network fabric configurations. The design is tailored for each cloud, to ensure maximal performance can be achieved using cloud-specific product adjustments,” Aras says. “The time for preventing harvest-now, decrypt-later is now,” Aras says, referring to the security threat where encrypted data is intercepted and stored for decryption once quantum computers become powerful enough. The Riverbed service addresses use cases spanning AI model training, inference operations, and emerging agentic AI applications. Data Express is initially deployed on Oracle Cloud Infrastructure, but Riverbed said the service will orchestrate data movement across AWS, Azure, and Google Cloud Platform, as well as on-premises data centers. General availability is planned for Q4 2025.

Read More »

Roundup: Digital Realty Marks Major Milestones in AI, Quantum Computing, Data Center Development

Key features of the DRIL include: • High-Density AI and HPC Testing. The DRIL supports AI and high-performance computing (HPC) workloads with high-density colocation, accommodating workloads up to 150 kW per cabinet. • AI Infrastructure Optimization. The ePlus AI Experience Center lets businesses explore AI-specific power, cooling, and GPU resource requirements in an environment optimized for AI infrastructure. • Hybrid Cloud Validation. With direct cloud connectivity, users can refine hybrid strategies and onboard through cross connects. • AI Workload Orchestration. Customers can orchestrate AI workloads across Digital Realty’s Private AI Exchange (AIPx) for seamless integration and performance. • Latency Testing Across Locations. Enterprises can test latency scenarios for seamless performance across multiple locations and cloud destinations. The firm’s Northern Virginia campus is the primary DRIL location, but companies can also test latency scenarios between there and other remote locations. DRIL rollout to other global locations is already in progress, and London is scheduled to go live in early 2026. Digital Realty, Redeployable Launch Pathway for Veteran Technical Careers As new data centers are created, they need talented workers. To that end, Digital Realty has partnered with Redeployable, an AI-powered career platform for veterans, to expand access to technical careers in the United Kingdom and United States. The collaboration launched a Site Engineer Pathway, now live on the Redeployable platform. It helps veterans explore, prepare for, and transition into roles at Digital Realty. Nearly half of veterans leave their first civilian role within a year, often due to unclear expectations, poor skill translation, and limited support, according to Redeployable. The Site Engineer Pathway uses real-world relevance and replaces vague job descriptions with an experience-based view of technical careers. Veterans can engage in scenario-based “job drops” simulating real facility and system challenges so they can assess their fit for the role before applying. They

Read More »

BlackRock’s $40B data center deal opens a new infrastructure battle for CIOs

Everest Group partner Yugal Joshi said, “CIOs are under significant pressure to clearly define their data center strategy beyond traditional one-off leases. Given most of the capacity is built and delivered by fewer players, CIOs need to prepare for a higher-price market with limited negotiation power.” The numbers bear this out. Global data center costs rose to $217.30 per kilowatt per month in the first quarter of 2025, with major markets seeing increases of 17-18% year-over-year, according to CBRE. Those prices are at levels last seen in 2011-2012, and analysts expect them to remain elevated. Gogia said, “The combination of AI demand, energy scarcity, and environmental regulation has permanently rewritten the economics of running workloads. Prices that once looked extraordinary have now become baseline.” Hyperscalers get first dibs The consolidation problem is compounded by the way capacity is being allocated. North America’s data center vacancy rate fell to 1.6% in the first half of 2025, with Northern Virginia posting just 0.76%, according to CBRE Research. More troubling for enterprises: 74.3% of capacity currently under construction is already preleased, primarily to cloud and AI providers. “The global compute market is no longer governed by open supply and demand,” Gogia said. “It is increasingly shaped by pre-emptive control. Hyperscalers and AI majors are reserving capacity years in advance, often before the first trench for power is dug. This has quietly created a two-tier world: one in which large players guarantee their future and everyone else competes for what remains.” That dynamic forces enterprises into longer planning cycles. “CIOs must forecast their infrastructure requirements with the same precision they apply to financial budgets and talent pipelines,” Gogia said. “The planning horizon must stretch to three or even five years.”

Read More »

Nvidia, Infineon partner for AI data center power overhaul

The solution is to convert power right at the GPU on the server board and to upgrade the backbone to 800 volts. That should squeeze more reliability and efficiency out of the system while dealing with the heat, Infineon stated.   Nvidia announced the 800 Volt direct current (VDC) power architecture at Computex 2025 as a much-needed replacement for the 54 Volt backbone currently in use, which is overwhelmed by the demand of AI processors and increasingly prone to failure. “This makes sense with the power needs of AI and how it is growing,” said Alvin Nguyen, senior analyst with Forrester Research. “This helps mitigate power losses seen from lower voltage and AC systems, reduces the need for materials like copper for wiring/bus bars, better reliability, and better serviceability.” Infineon says a shift to a centralized 800 VDC architecture allows for reduced power losses, higher efficiency and reliability. However, the new architecture requires new power conversion solutions and safety mechanisms to prevent potential hazards and costly server downtimes such as service and maintenance.

Read More »

Meta details cutting-edge networking technologies for AI infrastructure

ESUN initiative As part of its standardization efforts, Meta said it would be a key player in the new Ethernet for Scale-Up Networking (ESUN) initiative that brings together AMD, Arista, ARM, Broadcom, Cisco, HPE Networking, Marvell, Microsoft, NVIDIA, OpenAI and Oracle to advance the networking technology to handle the growing scale-up domain for AI systems. ESUN will focus solely on open, standards-based Ethernet switching and framing for scale-up networking—excluding host-side stacks, non-Ethernet protocols, application-layer solutions, and proprietary technologies. The group will focus on the development and interoperability of XPU network interfaces and Ethernet switch ASICs for scale-up networks, the OCP wrote in a blog. ESUN will actively engage with other organizations such as Ultra-Ethernet Consortium (UEC) and long-standing IEEE 802.3 Ethernet to align open standards, incorporate best practices, and accelerate innovation, the OCP stated. Data center networking milestones The launch of ESUN is just one of the AI networking developments Meta shared at the event. Meta engineers also announced three data center networking innovations aimed at making its infrastructure more flexible, scalable, and efficient: The evolution of Meta’s Disaggregated Scheduled Fabric (DSF) to support scale-out interconnect for large AI clusters that span entire data center buildings. A new Non-Scheduled Fabric (NSF) architecture based entirely on shallow-buffer, disaggregated Ethernet switches that will support our largest AI clusters like Prometheus. The addition of Minipack3N, based on Nvidia’s Ethernet Spectrum-4 ASIC, to Meta’s portfolio of 51Tbps OCP switches that use OCP’s Switch Abstraction Interface and Meta’s Facebook Open Switching System (FBOSS) software stack. DSF is Meta’s open networking fabric that completely separates switch hardware, NICs, endpoints, and other networking components from the underlying network and uses OCP-SAI and FBOSS to achieve that, according to Meta. It supports Ethernet-based RoCE RDMA over Converged Ethernet (RoCE/RDMA)) to endpoints, accelerators and NICs from multiple vendors, such as Nvidia,

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »

Engineering better care

Every Monday, more than a hundred members of Giovanni Traverso’s Laboratory for Translational Engineering (L4TE) fill a large classroom at Brigham and Women’s Hospital for

Read More »

Infinite folds

When Madonna Yoder ’17 was eight years old, she learned how to fold a square piece of paper over and over and over again. After

Read More »