Stay Ahead, Stay ONMINE

A Comprehensive Guide to LLM Temperature 🔥🌡️

While building my own LLM-based application, I found many prompt engineering guides, but few equivalent guides for determining the temperature setting. Of course, temperature is a simple numerical value while prompts can get mindblowingly complex, so it may feel trivial as a product decision. Still, choosing the right temperature can dramatically change the nature of […]

While building my own LLM-based application, I found many prompt engineering guides, but few equivalent guides for determining the temperature setting.

Of course, temperature is a simple numerical value while prompts can get mindblowingly complex, so it may feel trivial as a product decision. Still, choosing the right temperature can dramatically change the nature of your outputs, and anyone building a production-quality LLM application should choose temperature values with intention.

In this post, we’ll explore what temperature is and the math behind it, potential product implications, and how to choose the right temperature for your LLM application and evaluate it. At the end, I hope that you’ll have a clear course of action to find the right temperature for every LLM use case.

What is temperature?

Temperature is a number that controls the randomness of an LLM’s outputs. Most APIs limit the value to be from 0 to 1 or some similar range to keep the outputs in semantically coherent bounds.

From OpenAI’s documentation:

“Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.”

Intuitively, it’s like a dial that can adjust how “explorative” or “conservative” the model is when it spits out an answer.

What do these temperature values mean?

Personally, I find the math behind the temperature field very interesting, so I’ll dive into it. But if you’re already familiar with the innards of LLMs or you’re not interested in them, feel free to skip this section.

You probably know that an LLM generates text by predicting the next token after a given sequence of tokens. In its prediction process, it assigns probabilities to all possible tokens that could come next. For example, if the sequence passed to the LLM is “The giraffe ran over to the…”, it might assign high probabilities to words like “tree” or “fence” and lower probabilities to words like “apartment” or “book”.

But let’s back up a bit. How do these probabilities come to be?

These probabilities usually come from raw scores, known as logits, that are the results of many, many neural network calculations and other Machine Learning techniques. These logits are gold; they contain all the valuable information about what tokens could be selected next. But the problem with these logits is that they don’t fit the definition of a probability: they can be any number, positive or negative, like 2, or -3.65, or 20. They’re not necessarily between 0 and 1, and they don’t necessarily all add up to 1 like a nice probability distribution.

So, to make these logits usable, we need to use a function to transform them into a clean probability distribution. The function typically used here is called the softmax, and it’s essentially an elegant equation that does two important things:

  1. It turns all the logits into positive numbers.
  2. It scales the logits so they add up to 1.
Softmax formula

The softmax function works by taking each logit, raising e (around 2.718) to the power of that logit, and then dividing by the sum of all these exponentials. So the highest logit will still get the highest numerator, which means it gets the highest probability. But other tokens, even with negative logit values, will still get a chance.

Now here’s where Temperature comes in: temperature modifies the logits before applying softmax. The formula for softmax with temperature is:

Softmax with temperature

When the temperature is low, dividing the logits by T makes the values larger/more spread out. Then the exponentiation would make the highest value much larger than the others, making the probability distribution more uneven. The model would have a higher chance of picking the most probable token, resulting in a more deterministic output.

When the temperature is high, dividing the logits by T makes all the values smaller/closer together, spreading out the probability distribution more evenly. This means the model is more likely to pick less probable tokens, increasing randomness.

How to choose temperature

Of course, the best way to choose a temperature is to play around with it. I believe any temperature, like any prompt, should be substantiated with example runs and evaluated against other possibilities. We’ll discuss that in the next section.

But before we dive into that, I want to highlight that temperature is a crucial product decision, one that can significantly influence user behavior. It may seem rather straightforward to choose: lower for more accuracy-based applications, higher for more creative applications. But there are tradeoffs in both directions with downstream consequences for user trust and usage patterns. Here are some subtleties that come to mind:

  • Low temperatures can make the product feel authoritative. More deterministic outputs can create the illusion of expertise and foster user trust. However, this can also lead to gullible users. If responses are always confident, users might stop critically evaluating the AI’s outputs and just blindly trust them, even if they’re wrong.
  • Low temperatures can reduce decision fatigue. If you see one strong answer instead of many options, you’re more likely to take action without overthinking. This might lead to easier onboarding or lower cognitive load while using the product. Inversely, high temperatures could create more decision fatigue and lead to churn.
  • High temperatures can encourage user engagement. The unpredictability of high temperatures can keep users curious (like variable rewards), leading to longer sessions or increased interactions. Inversely, low temperatures might create stagnant user experiences that bore users.
  • Temperature can affect the way users refine their prompts. When answers are unexpected with high temperatures, users might be driven to clarify their prompts. But with low temperatures, users may be forced to add more detail or expand on their prompts in order to get new answers.

These are broad generalizations, and of course there are many more nuances with every specific application. But in most applications, the temperature can be a powerful variable to adjust in A/B testing, something to consider alongside your prompts.

Evaluating different temperatures

As developers, we’re used to unit testing: defining a set of inputs, running those inputs through a function, and getting a set of expected outputs. We sleep soundly at night when we ensure that our code is doing what we expect it to do and that our logic is satisfying some clear-cut constraints.

The promptfoo package lets you perform the LLM-prompt equivalent of unit testing, but there’s some additional nuance. Because LLM outputs are non-deterministic and often designed to do more creative tasks than strictly logical ones, it can be hard to define what an “expected output” looks like.

Defining your “expected output”

The simplest evaluation tactic is to have a human rate how good they think some output is, according to some rubric. For outputs where you’re looking for a certain “vibe” that you can’t express in words, this will probably be the most effective method.

Another simple evaluation tactic is to use deterministic metrics — these are things like “does the output contain a certain string?” or “is the output valid json?” or “does the output satisfy this javascript expression?”. If your expected output can be expressed in these ways, promptfoo has your back.

A more interesting, AI-age evaluation tactic is to use LLM-graded checks. These essentially use LLMs to evaluate your LLM-generated outputs, and can be quite effective if used properly. Promptfoo offers these model-graded metrics in multiple forms. The whole list is here, and it contains assertions from “is the output relevant to the original query?” to “compare the different test cases and tell me which one is best!” to “where does this output rank on this rubric I defined?”.

Example

Let’s say I’m creating a consumer-facing application that comes up with creative gift ideas and I want to empirically determine what temperature I should use with my main prompt.

I might want to evaluate metrics like relevance, originality, and feasibility within a certain budget and make sure that I’m picking the right temperature to optimize those factors. If I’m comparing GPT 4o-mini’s performance with temperatures of 0 vs. 1, my test file might start like this:

providers:
  - id: openai:gpt-4o-mini
    label: openai-gpt-4o-mini-lowtemp
    config:
      temperature: 0
  - id: openai:gpt-4o-mini
    label: openai-gpt-4o-mini-hightemp
    config:
      temperature: 1
prompts:
  - "Come up with a one-sentence creative gift idea for a person who is {{persona}}. It should cost under {{budget}}."

tests:
  - description: "Mary - attainable, under budget, original"
    vars:
      persona: "a 40 year old woman who loves natural wine and plays pickleball"
      budget: "$100"
    assert:
      - type: g-eval
        value:
          - "Check if the gift is easily attainable and reasonable"
          - "Check if the gift is likely under $100"
          - "Check if the gift would be considered original by the average American adult"
  - description: "Sean - answer relevance"
    vars:
      persona: "a 25 year old man who rock climbs, goes to raves, and lives in Hayes Valley"
      budget: "$50"
    assert:
      - type: answer-relevance
        threshold: 0.7

I’ll probably want to run the test cases repeatedly to test the effects of temperature changes across multiple same-input runs. In that case, I would use the repeat param like:

promptfoo eval --repeat 3
promptfoo test results

Conclusion

Temperature is a simple numerical parameter, but don’t be deceived by its simplicity: it can have far-reaching implications for any LLM application.

Tuning it just right is key to getting the behavior you want — too low, and your model plays it too safe; too high, and it starts spouting unpredictable responses. With tools like promptfoo, you can systematically test different settings and find your Goldilocks zone — not too cold, not too hot, but just right. ️

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Broadcom scales up Ethernet with Tomahawk Ultra for low latency HPC and AI

Broadcom Support for minimum packet size allows streaming of those packets at full bandwidth. That capability is essential for efficient communication in scientific and computational workloads. It is particularly important for scale-up networks where GPU-to-switch-to-GPU communication happens in a single hop. Lossless Ethernet gets an ‘Ultra’ boost Another specific area

Read More »

Nvidia to restart H20 exports to China, unveils new export-compliant GPU

China re-entry impact Nvidia’s announcements mark a bid to re-enter the world’s second-largest AI market under tightened US export controls. But this return may not mean business as usual. “Despite Nvidia’s market re-entry, Chinese companies will likely continue diversifying suppliers to strengthen supply chain resilience,” said Prabhu Ram, VP of

Read More »

GAIL, OIL Add 15 Years to Their Existing Gas Sale and Purchase Deal

State-owned GAIL (India) Ltd. has extended its existing gas sale and purchase agreement with Oil India Ltd. (OIL) for another 15 years. GAIL said in a media release that the deal came into effect on July 1, 2025. Under the extended agreement, 900,000 standard cubic meters per day (31.7 million standard cubic feet per day) of natural gas will be supplied from OIL’s Bakhri Tibba Block of Rajasthan, which covers the Dandewala, Tanot and Bagi Tibba fields. “This agreement highlights the dedication of both Maharatna central public sector enterprises in production, transportation, and distribution of natural gas available from domestic gas fields demonstrating their collaborative approach to enhancing energy security and accessibility”, GAIL said.  “The sourced gas will be supplied to the state-run power plant of Rajasthan Rajya Vidyut Utpadan Nigam”, it said. GAIL owns and operates a 16,421-kilometer (10,200 miles) network of natural gas pipelines across India, transmitting over 127 million standard cubic meters per day (4.4 billion cubic feet per day) in fiscal year 2024-25, GAIL said. GAIL added it is simultaneously executing multiple pipeline projects to broaden its reach. GAIL also owns and manages a gas-based petrochemical complex at Pata, with capacities of 810 kilotons per annum (KTA) at Pata and 280 KTA at its subsidiary Brahmaputra Cracker and Polymer Ltd. GAIL also owns a liquefied natural gas (LNG) portfolio totaling 16.56 million tons per annum, representing 61 percent of India’s total LNG imports. Additionally GAIL, through its units and joint ventures, holds a significant market share in city gas distribution and is venturing into renewable energy sectors such as solar, wind and biofuels, according to the company. To contact the author, email [email protected] What do you think? We’d love to hear from you, join the conversation on the Rigzone Energy Network. The Rigzone Energy Network is

Read More »

Naftogaz Partners with Baker Hughes to Strengthen Ukraine’s Energy Sector

Naftogaz Group has signed a strategic memorandum of understanding (MoU) with Baker Hughes Co. to strengthen Ukraine’s energy sector. Naftogaz said in a media release that the two companies would explore new technical, operational, and commercial opportunities in key energy segments, including oil and gas extraction, transportation, storage, and processing, as well as geothermal projects, carbon capture technologies, and electricity generation. “Naftogaz Group is building collaboration frameworks that help Ukraine overcome wartime challenges, implement modern energy solutions, and strengthen energy independence. Baker Hughes has global expertise across the energy value chain, and we see great potential in this cooperation”, Sergii Koretskyi, Chief Executive Officer of Naftogaz, said. The collaboration will encompass a range of services and equipment vital for drilling and well construction. Additionally, Naftogaz said the cooperation targets critical advancements in environmental sustainability, focusing on solutions for emission reduction, carbon capture projects, and the development of hydrogen and geothermal energy initiatives. The partnership will also extend to enhancing digital services, automation, and analytics within operations, as well as providing specialized equipment for subsurface activities and production processes and developing software for asset management and drilling optimization. Naftogaz said that Baker Hughes signed a separate MoU with Naftogaz’s JSC Ukrtransgaz to assess the potential to leverage Baker Hughes’ gas technology equipment portfolio for underground gas storage and power generation projects in Ukraine. “Baker Hughes is committed to supporting Naftogaz Group and companies in Ukraine with its energy technology solutions portfolio and expertise. The ultimate goal of our companies is to contribute to the energy security and decarbonization of Ukraine’s energy sector and support the eventual reconstruction of the country’s economy through reliable and modern energy solutions”, Paolo Noccioni, President of Baker Hughes’ Nuovo Pignone, said. To contact the author, email [email protected] What do you think? We’d love to hear from

Read More »

CNOOC Marks Another Breakthrough in Buried Hills Exploration in SCS

CNOOC Ltd. said Wednesday it had achieved another “major breakthrough” in the exploration of buried hills in the South China Sea after drilling 3,362 meters (11,030.18 feet) deep in the Weizhou 10-5 South field. The field is in the Beibu Gulf, or the Gulf of Tonkin, in waters with an average depth of 37 meters, according to CNOOC Ltd., majority-owned by the state’s China National Offshore Oil Corp. Exploration well WZ10-5S-2d showed a 211-meter oil and gas pay zone. Tests yielded a production of 165,000 cubic feet of natural gas and 400 barrels of oil per day, CNOOC Ltd. said in a press release. The well “marks a major exploration breakthrough in the metamorphic sandstone and slate buried hills offshore China”, the announcement declared. “In recent years, CNOOC Ltd. has consistently intensified theoretical innovation and tackled key technology challenges in buried hills and deep plays exploration”, commented CNOOC Ltd. chief geologist Xu Changgui. “Breakthroughs have been achieved in the exploration of Paleozoic granite and Proterozoic metamorphic sandstone and slate buried hills within the Beibu Gulf Basin. “They demonstrate the vast exploration potential in buried hills formations, drive the secondary exploration process in mature areas, and mark the commencement of large-scale exploration of buried hills in the Beibu Gulf Basin”. CNOOC Ltd. chief executive Zhou Xinhuai added, “In the future, CNOOC Ltd. will continue to intensify research on key theories and technologies for deep play exploration, to enhance research and development capabilities, advance reserves and production growth, and to ensure stable supply of oil and gas”. In March CNOOC Ltd. claimed a breakthrough in the exploration of Paleozoic granite buried hills in the Beibu Gulf through an oil and gas discovery in the Weizhou 10-5 field. Drilled to a total depth of about 4,840 meters in an area with an average

Read More »

Crude Falls Amid Stronger Dollar, Supply Risk Doubts

Oil fell as the dollar strengthened and traders doubted US President Donald Trump’s plan to pressure Moscow would disrupt Russian exports. West Texas Intermediate slid 0.7% to settle above $66, extending Monday’s losses. Trump told reporters the US will impose a 19% tariff on goods from Indonesia after teasing the deal earlier in the day. The dollar strengthened, making commodities priced in the currency less attractive. Trump’s plan to pressure Russia into a ceasefire with Ukraine that was released Monday didn’t directly target energy infrastructure, a decision that brought some bears off of the sidelines. The administration intends to impose 100% tariffs on Russia only if the hostilities don’t end with a deal in 50 days, allaying fears of near-term supply tightness. “Since the start of the Ukraine war, it has become evident that halting Russian oil trade by targeting Russian sellers or the numerous shippers and payment intermediaries is nearly impossible,” JPMorgan Chase & Co. analysts led by Natasha Kaneva wrote in a note. Prices briefly popped on comments by US Energy Secretary Chris Wright that the US is considering creative ways to refill the Strategic Petroleum Reserve, before resuming their decline. Futures also came under pressure as investors liquidated their positions in WTI’s so-called prompt spread ahead of the contract expiry. US crude’s prompt spread — the difference between its two nearest contracts — held steady at around $1.16 a barrel in backwardation. While that’s still a bullish pattern, with nearer-term prices above those further out, it’s notably lower than Monday’s peak of $1.49. The gauge is set to be closely followed as the market’s focus shifts back to supply. The Organization of the Petroleum Exporting Countries partly pushed back against an International Energy Agency report that Saudi Arabia’ crude production surged in June. The cartel’s figures show

Read More »

Egypt Delays Some LNG Imports as New Terminals Yet to Start

Egypt is delaying some liquefied natural gas deliveries as its newest import facilities haven’t yet started operating. A small number of cargoes due to arrive in July are being rescheduled for next month, people familiar with the matter said, asking not to be identified discussing private information. The delay is not expected to be material or recurring, they said. The most populous Arab country has turned into a major LNG importer amid deteriorating domestic gas production and high consumption. Last month, Egyptian Natural Gas Holding Co. agreed on large LNG deals with suppliers including Saudi Aramco, Trafigura Group and Vitol Group, with contracts starting as soon as July and lasting as long as two and a half years. However, imports so far remain in line with recent months’ levels despite peak summer demand in the nation. Supplies have been coming in through the country’s only operational floating import terminal, the Hoegh Galleon vessel, ship-tracking data compiled by Bloomberg show.  Egypt has leased two more vessels for imports, Energos Power and Energos Eskimo. While the terminals have arrived in the country, and Egypt ordered expedited connection to the grid, they have not yet imported cargoes, according to the ship-tracking data. Both new terminals won’t be ready to receive LNG until the end of July, another person with knowledge of the matter said, as the onshore portion of the equipment is not ready yet. Egypt’s oil ministry didn’t respond to a request for comment. Egyptian Natural Gas Holding Co. declined to comment. WHAT DO YOU THINK? Generated by readers, the comments included herein do not reflect the views and opinions of Rigzone. All comments are subject to editorial review. Off-topic, inappropriate or insulting comments will be removed.

Read More »

China Refining Output Rebounds to Strongest Since 2023

China refined the most crude oil in nearly two years in June, as plants returned from seasonal maintenance to seize on better margins for fuels like diesel. Refining output rose to more than 15.2 million barrels a day, the strongest pace since September 2023, according to Bloomberg calculations based on figures released by the statistics bureau on Tuesday. Compared to June last year, volumes surged by 8.5 percent, reversing the declines seen in April and May. Improved margins and fewer idled units supported robust refining activity, with further strength expected this month as new plants come online, said Amy Sun, an analyst with GL Consulting, a think tank affiliated with Mysteel OilChem.  Diesel cracks, a measure of profitability, at independent refiners rose to nearly $18 a barrel at one point late last month, the highest since 2023, according to data tracked by consultant JLC International. Run rates at state-owned refineries soared to nearly 84 percent of capacity at end-June, the highest in more than three months, JLC’s data showed.  The refinery output is in line with the rise in crude purchases reported for June, which hit their highest level since August 2023 on a daily basis, according to Bloomberg calculations. Imports are expected to accelerate as the country adds as much as 140 million barrels of oil to replenish its Strategic Petroleum Reserves from later this year, Energy Aspects said in a note. What do you think? We’d love to hear from you, join the conversation on the Rigzone Energy Network. The Rigzone Energy Network is a new social experience created for you and all energy professionals to Speak Up about our industry, share knowledge, connect with peers and industry insiders and engage in a professional community that will empower your career in energy.

Read More »

Equinix, AWS embrace liquid cooling to power AI implementations

With AWS, it deployed In-Row Heat Exchangers (IRHX), a custom-built liquid cooling system designed specifically for servers using Nvidia’s Blackwell GPUs, it’s most powerful but also its hottest running processors used for AI training and inference. The IRHX unit has three components: a water‑distribution cabinet, an integrated pumping unit, and in‑row fan‑coil modules. It uses direct to chip liquid cooling just like the equinox servers, where cold‑plates attached to the chip draw heat from the chips and is cooled by the liquid. The warmed coolant then flows through the coils of heat exchangers, where high‑speed fans Blow on the pipes to cool them, like a car radiator. This type of cooling is nothing new, and there are a few direct to chip liquid cooling solutions on the market from Vertiv, CoolIT, Motivair, and Delta Electronics all sell liquid cooling options. But AWS separates the pumping unit from the fan-coil modules, letting a single pumping system to support large number of fan units. These modular fans can be added or removed as cooling requirements evolve, giving AWS the flexibility to adjust the system per row and site. This led to some concern that Amazon would disrupt the market for liquid cooling, but as a Dell’Oro Group analyst put it, Amazon develops custom technologies for itself and does not go into competition or business with other data center infrastructure companies.

Read More »

Intel CEO: We are not in the top 10 semiconductor companies

The Q&A session came on the heels of layoffs across the company. Tan was hired in March, and almost immediately he began to promise to divest and reduce non-core assets. Gelsinger had also begun divesting the company of losers, but they were nibbles around the edge. Tan is promising to take an axe to the place. In addition to discontinuing products, the company has outsourced marketing and media relations — for the first time in more than 25 years of covering this company, I have no internal contacts at Intel. Many more workers are going to lose their jobs in coming weeks. So far about 500 have been cut in Oregon and California but many more is expected — as much as 20% of the overall company staff may go, and Intel has over 100,000 employees, according to published reports. Tan believes the company is bloated and too bogged down with layers of management to be reactive and responsive in the same way that AMD and Nvidia are. “The whole process of that (deciding) is so slow and eventually nobody makes a decision,” he is quoted as saying. Something he has decided on is AI, and he seems to have decided to give up. “On training, I think it is too late for us,” Tan said, adding that Nvidia’s position in that market is simply “too strong.” So there goes what sales Gaudi3 could muster. Instead, Tan said Intel will focus on “edge” artificial intelligence, where AI capabilities Are brought to PCs and other remote devices rather than big AI processors in data centers like Nvidia and AMD are doing. “That’s an area that I think is emerging, coming up very big and we want to make sure that we capture,” Tan said.

Read More »

AMD: Latest news and insights

Survey: AMD continues to take server share from Intel May 20, 2025: AMD continues to take market share from Intel, growing at a faster rate and closing the gap between the two companies to the narrowest it has ever been. AMD, Nvidia partner with Saudi startup to build multi-billion dollar AI service centers May 15, 2025: As part of the avalanche of business deals that came from President Trump’s Middle East tour, both AMD and Nvidia have struck multi-billion dollar deals with an emerging Saudi AI firm. AMD targets hosting providers with affordable EPYC 4005 processors May 14, 2025: AMD launched its latest set of data center processors, targeting hosted IT service providers. The EPYC 4005 series is purpose-built with enterprise-class features and support for modern infrastructure technologies at an affordable price, the company said. Jio teams with AMD, Cisco and Nokia to build AI-enabled telecom platform March 18, 2025: Jio has teamed up with AMD, Cisco and Nokia to build an AI-enabled platform for telecom networks. The goal is to make networks smarter, more secure and more efficient to help service providers cut costs and develop new services. AMD patches microcode security holes after accidental early disclosure February 3, 2025: AMD issued two patches for severe microcode security flaws, defects that AMD said “could lead to the loss of Secure Encrypted Virtualization (SEV) protection.” The bugs were inadvertently revealed by a partner.

Read More »

Nvidia hits $4T market cap as AI, high-performance semiconductors hit stride

“The company added $1 trillion in market value in less than a year, a pace that surpasses Apple and Microsoft’s previous trajectories. This rapid ascent reflects how indispensable AI chipmakers have become in today’s digital economy,” Kiran Raj, practice head, Strategic Intelligence (Disruptor) at GlobalData, said in a statement. According to GlobalData’s Innovation Radar report, “AI Chips – Trends, Market Dynamics and Innovations,” the global AI chip market is projected to reach $154 billion by 2030, growing at a compound annual growth rate (CAGR) of 20%. Nvidia has much of that market, but it also has a giant bullseye on its back with many competitors gunning for its crown. “With its AI chips powering everything from data centers and cloud computing to autonomous vehicles and robotics, Nvidia is uniquely positioned. However, competitive pressure is mounting. Players like AMD, Intel, Google, and Huawei are doubling down on custom silicon, while regulatory headwinds and export restrictions are reshaping the competitive dynamics,” he said.

Read More »

Enterprises will strengthen networks to take on AI, survey finds

Private data centers: 29.5% Traditional public cloud: 35.4% GPU as a service specialists: 18.5% Edge compute: 16.6% “There is little variation from training to inference, but the general pattern is workloads are concentrated a bit in traditional public cloud and then hyperscalers have significant presence in private data centers,” McGillicuddy explained. “There is emerging interest around deploying AI workloads at the corporate edge and edge compute environments as well, which allows them to have workloads residing closer to edge data in the enterprise, which helps them combat latency issues and things like that. The big key takeaway here is that the typical enterprise is going to need to make sure that its data center network is ready to support AI workloads.” AI networking challenges The popularity of AI doesn’t remove some of the business and technical concerns that the technology brings to enterprise leaders. According to the EMA survey, business concerns include security risk (39%), cost/budget (33%), rapid technology evolution (33%), and networking team skills gaps (29%). Respondents also indicated several concerns around both data center networking issues and WAN issues. Concerns related to data center networking included: Integration between AI network and legacy networks: 43% Bandwidth demand: 41% Coordinating traffic flows of synchronized AI workloads: 38% Latency: 36% WAN issues respondents shared included: Complexity of workload distribution across sites: 42% Latency between workloads and data at WAN edge: 39% Complexity of traffic prioritization: 36% Network congestion: 33% “It’s really not cheap to make your network AI ready,” McGillicuddy stated. “You might need to invest in a lot of new switches and you might need to upgrade your WAN or switch vendors. You might need to make some changes to your underlay around what kind of connectivity your AI traffic is going over.” Enterprise leaders intend to invest in infrastructure

Read More »

CoreWeave acquires Core Scientific for $9B to power AI infrastructure push

Such a shift, analysts say, could offer short-term benefits for enterprises, particularly in cost and access, but also introduces new operational risks. “This acquisition may potentially lower enterprise pricing through lease cost elimination and annual savings, while improving GPU access via expanded power capacity, enabling faster deployment of Nvidia chipsets and systems,” said Charlie Dai, VP and principal analyst at Forrester. “However, service reliability risks persist during this crypto-to-AI retrofitting.” This also indicates that struggling vendors such as Core Scientific and similar have a way to cash out, according to Yugal Joshi, partner at Everest Group. “However, it does not materially impact the availability of Nvidia GPUs and similar for enterprises,” Joshi added. “Consolidation does impact the pricing power of vendors.” Concerns for enterprises Rising demand for AI-ready infrastructure can raise concerns among enterprises, particularly over access to power-rich data centers and future capacity constraints. “The biggest concern that CIOs should have with this acquisition is that mature data center infrastructure with dedicated power is an acquisition target,” said Hyoun Park, CEO and chief analyst at Amalgam Insights. “This may turn out to create challenges for CIOs currently collocating data workloads or seeking to keep more of their data loads on private data centers rather than in the cloud.”

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »